Browsing by Author "Zewotir, Temesgen Tenaw."
Now showing 1 - 20 of 35
- Results Per Page
- Sort Options
Item Analysis of a binary response : an application to entrepreneurship success in South Sudan.(2012) Lugga, James Lemi John Stephen.; Zewotir, Temesgen Tenaw.Just over half (50:6%) of the population of South Sudan lives on less than one US Dollar a day. Three quarters of the population live below the poverty line (NBS, Poverty Report, 2010). Generally, effective government policy to reduce unemployment and eradicate poverty focuses on stimulating new businesses. Micro and small enterprises (MSEs) are the major source of employment and income for many in under-developed countries. The objective of this study is to identify factors that determine business success and failure in South Sudan. To achieve this objective, generalized linear models, survey logistic models, the generalized linear mixed models and multiple correspondence analysis are used. The data used in this study is generated from the business survey conducted in 2010. The response variable, which is defined as business success or failure was measured by profit and loss in businesses. Fourteen explanatory variables were identified as factors contributing to business success and failure. A main effect model consisting of the fourteen explanatory variables and three interaction effects were fitted to the data. In order to account for the complexity of the survey design, survey logistic and generalized linear mixed models are refitted to the same variables in the main effect model. To confirm the results from the model we used multiple correspondence analysis.Item Analysis of discrete time competing risks data with missing failure causes and cured subjects.(2023) Ndlovu, Bonginkosi Duncan.; Zewotir, Temesgen Tenaw.; Melesse, Sileshi Fanta.This thesis is motivated by the limitations of the existing discrete time competing risks models vis-a-vis the treatment of data that comes with missing failure causes or a sizableproportions of cured subjects. The discrete time models that have been suggested to date (Davis and Lawrance, 1989; Tutz and Schmid, 2016; Ambrogi et al., 2009; Lee et al., 2018) are cause-specific-hazard denominated. Clearly, this fact summarily disqualifies these models from consideration if data comes with missing failure causes. It is also a well documented fact that naive application of the cause-specific-hazards to data that has a sizable proportion of cured subjects may produce downward biased estimates for these quantities. The existing models can be considered within the multiple imputation framework (Rubin, 1987) for handling missing failure causes, but the prospects of scaling them up for handling cured subjects are minimal, if not nil. In this thesis we address these issues concerning the treatment of missing failure causes and cured subjects in discrete time settings. Towards that end, we focus on the mixture model (Larson and Dinse, 1985) and the vertical model (Nicolaie et al., 2010) because these models possess certain properties which dovetail with the objectives of this thesis. The mixture model has been upgraded into a model that can handle cured subjects. Nicolaie et al. (2015) have demonstrated that the vertical model can also handle missing failure causes as is. Nicolaie et al. (2018) have also extended the vertical model to deal with cured subjects. Our strategy in this thesis is to exploit both the mixture model and the vertical model as a launching pad to advance discrete time models for handling data that comes with missing failure causes or cured subjects.Item Application of mixed model and spatial analysis methods in multi-environmental and agricultural field trials.(2015) Negash, Asnake Worku.; Mwambi, Henry Godwell.; Zewotir, Temesgen Tenaw.Agricultural experimentation involves selection of experimental materials, selection of experimental units, planning of experiments, and collection of relevant information, analysis and interpretation of the results. An overall work of this thesis is on the importance, improvement and efficiency of variety contrast by using linear mixed mode with spatial-variance covariance compare to the usual ANOVA methods of analysis. A need of some considerations on the recently widely usage of a bi-plot analysis of genotype plus genotype by environment interaction (GEE) on the analysis of multi-environmental crop trials. An application of some parametric bootstrap method for testing and selecting multiplicative terms in GGE and AMMI models and to show some statistical methods for handling missing data using multiple imputations principal component and other deterministic approaches. Multi-environment agricultural experiments are unbalanced because several genotypes are not tested in some environments or missing of a measurement from some plot during the experimental stage. A need for imputation of the missing values sometimes is necessary. Multiple imputation of missing data using the cross-validation by eigenvector method and PCA methods are applied. We can see the advantage of these methods having easy computational implementation, no need of any distributional or structural assumptions and do not have any restrictions regarding the pattern or mechanism of missing data in experiments. Genotype by environment (G×E) interaction is associated with the differential performance of genotypes tested at different locations and in different years, and influences selection and recommendation of cultivars. Wheat genotypes were evaluated in six environments to determine the G×E interactions and stability of the genotypes. Additive main effects and multiplicative interactions (AMMI) was conducted for grain yield of both year and it showed that grain yield variation due to environments, genotypes and (G×E) were highly significant. Stability for grain yield was determined using genotype plus genotype by environment interaction (GGE) biplot analysis. The first two principal components (PC1 and PC2) were used to create a 2-dimensional GGE biplot. Which-won where pattern was based on six locations in the first and five locations in the second year for all the twenty genotypes? The resulting pattern is one realization among many possible outcomes, and its repeatability in the second was different and a future year is quite unknown. A repeatability of which won-where pattern over years is the necessary and sufficient condition for mega-environment delineations and genotype recommendation. The advantages of mixed models with spatial variance-covariance structures, and direct implications of model choice on the inference of varietal performance, ranking and testing based on two multi-environmental data sets from realistic national trials. A model comparison with a ᵪ2-test for the trials in the two data sets (wheat and barley data) suggested that selected spatial variance-covariance structures fitted the data significantly better than the ANOVA model. The forms of optimally-fitted spatial variance-covariance, ranking and consistency ratio test were not the same from one trial (location) to the other. Linear mixed models with single stage analysis including spatial variance-covariance structure with a group factor of location on the random model also improved the real genotype effect estimation and their ranking. The model also improved varietal performance estimation because of its capacity to handle additional sources of variation, location and genotype by location (environment) interaction variation and accommodating of local stationary trend. The knowledge and understanding of statistical methods for analysis of multi-environmental data analysis is particularly important for plant breeders and those who are working on the improvement of plant variety for proper selection and decision making of the next level of improvement for country agricultural development.Item Application of statistical multivariate techniques to wood quality data.(2010) Negash, Asnake Worku.; Mwambi, Henry Godwell.; Zewotir, Temesgen Tenaw.Sappi is one of the leading producer and supplier of Eucalyptus pulp to the world market. It is also a great contributor to South Africa economy in terms of employment opportunity to the rural people through its large plantation and export earnings. Pulp mills production of quality wood pulp is mainly affected by the supply of non uniform raw material namely Eucalyptus tree supply from various plantations. Improvement in quality of the pulp depends directly on the improvement on the quality of the raw materials. Knowing factors which affect the pulp quality is important for tree breeders. Thus, the main objective of this research is first to determine which of the anatomical, chemical and pulp properties of wood are significant factors that affect pulp properties namely viscosity, brightness and yield. Secondly the study will also investigate the effect of the difference in plantation location and site quality, trees age and species type difference on viscosity, brightness and yield of wood pulp. In order to meet the above mentioned objectives, data for this research was obtained from Sappi’s P186 trial and other two published reports from the Council for Scientific and Industrial Research (CSIR). Principal component analysis, cluster analysis, multiple regression analysis and multivariate linear regression analysis were used. These statistical analysis methods were used to carry out mean comparison of pulp quality measurements based on viscosity, brightness and yield of trees of different age, location, site quality and hybrid type and the results indicate that these four factors (age, location, site quality and hybrid type) and some anatomical and chemical measurements (fibre lumen diameter, kappa number, total hemicelluloses and total lignin) have significant effect on pulp quality measurements.Item Appraising South African residential property and measuring price developments.(2022) Bax, Dane Gregory.; Zewotir, Temesgen Tenaw.; North, Delia Elizabeth.Housing wealth is well established as one of the most important sources of wealth for households and investors. However, owning a home is a fundamental human need, making monitoring residential property prices a social endeavour as well as an economic one, especially under times of economic uncertainty. Residential property prices also have a direct effect on the macroeconomy because of how they influence wealth effects where increased consumption by households is experienced through gains in households balance sheets due to increased equity. Collecting correct and adequate data is vitally important in analysing property market movements and developments, particularly given globalization, and the interlinked nature of financial markets. Although measuring residential property price developments is an important economic and social activity, matching properties over time is extremely difficult because the sale of homes is typically infrequent, characteristics vary, and homes are uniquely located in space. This thesis focuses on appraising several residential property types located throughout South Africa from January 2013 to August 2017, investigating different modelling approaches with the aim of developing a residential property price index. Various methods exist to create residential property price indices, however, hedonic models have proven useful as a quality adjusted approach where pure price changes are measured and not simply changes in the composition of samples over time. Before fitting any models to appraise homes, an autoencoder was built to detect anomalous data, due to human error at the data entry stage. The autoencoder identified improbable data resulting in a final data set of 415 200 records, once duplicate records were identified and removed. This study first investigated generalised linear models as a candidate approach to appraise homes in South Africa which showed possible alternatives to the ubiquitous log linear model. Relaxing functional form assumptions and considering the nested locational structure of homes, hierarchical generalised linear models were considered as the next candidate method. Partitioning around the mediods was applied to find additional spatial groupings which were treated as random effects along with the suburb. The findings showed that the marginal utility of structural attributes was non-linear and smooth functions of covariates were an appropriate treatment. Furthermore, the use of random effects helped account for the spatial heterogeneity of homes through partial pooling. Finally, machine learning algorithms were investigated because of minimal assumptions about the data generating process and the possibility of complex non-linear and interaction effects. Random forests, gradient boosted machines and neural networks were adopted to fit these appraisal functions. The gradient boosted machines had the best goodness of fit, showing non-linear relationships between the structural characteristics of homes and listing prices. Partial dependence plots were able to quantify the marginal utility over the distributions of different structural characteristics. The results show that larger sized homes do not necessarily yield a premium and a diminished return is evident, similar to the results of the hierarchical generalised additive models. The variable importance plots showed that location was the most important predictor followed by the number of bathrooms and the size of a home. The gradient boosted machines achieved the lowest out of sample error and were used to develop the residential property price index. A chained, dual imputation Fisher index was applied to the gradient boosted machines showing nominal and real price developments at a country and provincial level. The chained, dual imputation Fisher index provided less noisy estimates than a simple median mix adjusted index. Although listing prices were used and not transacted prices, the trend was similar to the ABSA Global Property Guide. In order to make this research useful to property market participants, a web application was developed to show how the proposed methodology can be democratised by property portals and real estate agencies. The Listing Price Index Calculator was created to easily communicate the results through a front-end interface, showing how property portals and real estate agencies can leverage their data to aid sellers in determining listing prices to go to market with, help buyers obtain an average estimate of the home they wish to purchase and guide property market participants on price developments.Item An assessment of modified systematic sampling designs in the presence of linear trend.(2017) Naidoo, Llewellyn Reeve.; North, Delia Elizabeth.; Zewotir, Temesgen Tenaw.; Arnab, Raghunath.Sampling is used to estimate population parameters, as it is usually impossible to study a whole population, due to time and budget restrictions. There are various sampling designs to address this issue and this thesis is related with a particular probability sampling design, known as systematic sampling. Systematic sampling is operationally convenient and efficient and hence is used extensively in most practical situations. The shortcomings associated with systematic sampling include: (i) it is impossible to obtain an unbiased estimate of the sampling variance when conducting systematic sampling with a single random start; (iii) if the population size is not a multiple of the sample size, then conducting conventional systematic sampling, also known as linear systematic sampling, may result in variable sample sizes. In this thesis, I would like to provide some contribution to the current body of knowledge, by proposing modifications to the systematic sampling design, so as to address these shortcomings. Firstly, a discussion on the measures used to compare the various probability sampling designs is provided, before reviewing the general theory of systematic sampling. The per- formance of systematic sampling is dependent on the population structure. Hence, this thesis concentrates on a specific and common population structure, namely, linear trend. A discussion on the performance of linear systematic sampling and all relative modifica- tions, including a new proposed modification, is then presented under the assumption of linear trend among the population units. For each of the above-mentioned problems, a brief review of all the associated sampling designs from existing literature, along with my proposed modified design, will then be explored. Thereafter, I will introduce a modified sampling design that addresses the above-mentioned problems in tandem, before providing a comprehensive report on the thesis. The aim of this thesis is to provide solutions to the above-mentioned disadvantages, by proposing modified systematic sampling designs and/or estimators that are favourable over its existing literature counterparts. Keywords: systematic sampling; super-population model; Horvitz-Thompson estimator; Yates' end corrections method; balanced modified systematic sampling; multiple-start balanced modified systematic sampling; remainder modified systematic sampling; balanced centered random sampling.Item Bayesian spatial modeling of malnutrition and mortality among under-five children in sub-Saharan Africa.(2019) Adeyemi, Rasheed Alani.; Zewotir, Temesgen Tenaw.; Ramroop, Shaun.The aim of this thesis is to develop and extend Bayesian statistical models in the area of spatial modeling and apply them to child health outcomes, with particular focus on childhood malnutrition and mortality among under-five children. The easy availability of a geo-referenced database has stimulated a paradigm shift in methodological approaches to spatial analysis. This study reviewed the spatial methods and disease mapping models developed for areal (lattice) data analysis. Observational data collected from complex design surveys and geographical locations often violates the independent assumption of classical regression models. By relaxing the restrictive linearity and normality assumptions of classical regression models, this study first developed a flexible semi-parametric spatial model that accommodates the usual fixed effect, nonlinear and geographical component in a unified model. The approach was explored in the analysis of spatial patterns of child birth outcomes in Nigeria. The study also addressed the issue of disease clustering, which is of interest to epidemiologists and public health officials. The study then proposed a Bayesian hierarchical analysis approach for Poisson count data and formulated a Poisson version of generalized linear mixed models (GLMMs) for analyzing childhood mortality. The model simultaneously addressed the problem of overdispersion and spatial dependence by the inclusion of the risk factors and random effects in a single model. The proposed approach identified regions with elevated relative risk or clustering of high mortality and evaluated the small scale geographical disparities in sub-populations across the regions. The study identified another challenge in spatial data analysis, which are spatial autocorrelation and model misspecification. The study then fitted geoadditive mixed (GAM) models to analyze childhood anaemia data belonging to a family of exponential distributions (Gaussian, binary and multinomial). The GAM models are extension of generalized linear mixed models by allowing the inclusion of splines for continuous covariate (or time) trends with the parametric function. Lastly, the shared component model originally developed for multiple disease mapping was reviewed and modified to suit the binary data at hand. A multivariate conditional autoregressive (MCAR) model was developed and applied to jointly analyze three child malnutrition indicators. The approach facilitated the estimation of conditional correlation between the diseases; assess the spatial association with the regions and geographical variation of individual disease prevalence. The spatial analysis presented in this thesis is useful to inform health-care policy and resource allocation. This thesis contributes to methodological applications in life sciences, environmental sciences, public health and agriculture. The present study expands the existing methods and tools for health impact assessment in public health studies. KEYWORDS: Conditional Autoregressive (CAR) model, Disease Mapping Models, Multiple Disease mapping, Health Geography, Ecology Models, Spatial Epidemiology, Childhood Health outcomes.Item Classification of banking clients according to their loan default status using machine learning algorithms.(2022) Reddy, Suveshnee.; Chifurira, Retius.; Zewotir, Temesgen Tenaw.Loan lending has become crucial for both individuals and companies. For lending institutions, although profitable, it can be very risky due to clients defaulting on their loan agreement. Credit risk assessment is a critical process which is carried out by most lending institutions; it reduces the possibility of lending to clients who will default on their loan repayment, however, it does not eliminate the problem. Thus, a collections process which aims to retrieve unpaid debt is also necessary. With South Africa facing another recession, which was only worsened by the lockdown during the covid-19 pandemic, lending institutions can expect an increase in the number loan defaulters. To counter this increase, changes will have to be made to their policies and processes. Changes can be made to either the loan application procedures (e.g. credit risk assessment, affordability assessment et cetera) or the post disbursal procedures (e.g. collections processes). The aim of this study is to predict whether a client will default on his/her loan, using machine learning algorithms, in order to enhance the collection process of the financial institution under study, where default is defined as missing at least three payments in the first 12 months of the loan being granted. The logistic regression model, decision tree, random forest, support vector machine, Naïve Bayes classifier, k-nearest neighbours algorithm and the artificial neural network were fitted to the balanced dataset. In the researcher’s analysis, loan data from a South African financial institution were used for the period August 2019 to December 2019. Variables related to a client’s demographics, income, expenses and debt, as well as loan information, were included in the dataset. Exploratory data analysis (EDA) was utilised in order to analyse the dataset and summarise their main characteristics. To reduce the dimensionality of the dataset, two techniques were used, namely principal component analysis (PCA), which is also used to correct the data for multicollinearity, and feature selection (i.e., recursive feature elimination). Each model was fitted to the dataset using these two techniques, and the confusion matrix and metrics such balanced accuracy, true positive ratio, true negative ratio, AUC score and the Gini coefficient were used to evaluate the different models in order to determine which model performed the best and was most suited for this application problem. The results show that when using the PCA approach, the random forest model, which obtained a balanced accuracy score, true positive ratio and AUC score of 0.69, 0.74 and 0.74, respectively, performed the best. The random forest model also performed the best when using the feature selection technique, obtaining a balanced accuracy score, true positive ratio and AUC score of 0.69, 0.74 and 0.75, respectively. When comparing the random forest model using PCA to the random forest model using feature selection, the results showed a marginal difference between each performance metric analysed. The random forest model using PCA utilised 48 variables, whereas the random forest model using feature selection utilised only 18 variables and thus seemed to be more suitable for the classification problem under study. The results of this study are expected to benefit analysts and data scientists in financial institutions who would like to identify the robust machine learning algorithms for classifying defaulting clients. This study is also of significance to policy makers who would want to identify the risk factors associated with loan defaulting clients.Item Co-morbidity of childhood anaemia and malaria with a district-level spatial effect.(2021) Roberts, Danielle Jade.; Zewotir, Temesgen Tenaw.Anaemia and malaria are the leading causes of sub-Saharan African childhood morbidity and mortality. This thesis aimed to explore the risk factors as well as the complex relationship between anaemia and malaria in young children across the districts or counties of four contiguous sub-Saharan African countries, namely Kenya, Malawi, Tanzania and Uganda. Nationally representative data from the Demographic and Health Surveys conducted in all four countries was used. The observed prevalence of anaemia and malaria was 52.5% and 19.7%, respectively, with a 15.1% prevalence of co-infection. Machine learning based exploratory classification methods were used to gain insight into the relationships and patterns among the explanatory variables and the two responses. The administrative districts are the level at which public health decisions are made within each of the countries. Accordingly, the best linear unbiased predictor (BLUP) ranking and selection approach was adopted to investigate the district-level spatial effects, while controlling for child-level, household-level and environmental factors. Further to the geoadditive model, a generalised additive mixed model with a spatial effect based on the geographical coordinates of the sampled clusters within the districts was applied. The relationship between the two diseases was further explored using joint modelling approaches: a bivariate copula geoadditive model and shared component model. The child’s age, mother’s education level, household wealth index and cluster altitude were found to be significantly associated with both the anaemia and malaria status of the child. The results of this study can help policy makers target the correct set of interventions or prevent the use of incorrect interventions for anaemia and malaria control and prevention. This aids in the targeted allocation of limited district health system resources within each of these countries.Item Count data modelling application.(2019) Ibeji, Jecinta Ugochukwu.; North, Delia Elizabeth.; Zewotir, Temesgen Tenaw.The rapid increase of total children ever born without a proportionate growth in the Nigerian economy has been a concern and making prediction with count data requires applying appropriate regression model.. As count data assumes discrete, non-negative values, a Poisson distribution is the ideal distribution to describe this data, but it is deficient due to equality of variance and mean. This deficiency results in under/over-dispersion and the estimation of the standard errors will be biased rendering the test statistics incorrect. This study aimed to model count data with the application of total children ever born using a Negative Binomial and Generalized Poisson regression The Nigeria Demographic and Health Survey 2013 data of women within the age of 15-49 years were used and three models applied to investigate the factors affecting the number of children ever born. A predictive count modelling was also carried out based on the performance evaluation metrics (root mean square error, mean absolute error, R-squared and mean square error). In the inferential modeling, Generalized Poisson Model was found to be superior with age of household head (𝑃<.0001), age of respondent at the time of first birth (𝑃<.0001), urban-rural status (𝑃<.0001), and religion (𝑃<.0001) being significantly associated with total children ever born. In the predictive modeling, all the three models showed almost identical performance evaluation metrics but Poisson regression was chosen as the best because it is the simplest model. In conclusion, early marriage, religious belief and unawareness of women who dwell in rural areas should be checked to control total children ever born in Nigeria.Item Covariates and latents in growth modelling.(2014) Melesse, Sileshi Fanta.; Zewotir, Temesgen Tenaw.The growth curve models are the natural models for the increment processes taking place gradually over time. When individuals are observed over time it is often apparent that they grow at different rates, even though they are clones and no differences in treatment or environment are present. Neverthless the classical growth curve model only deals with the average growth and does not account for individual differences, nor does it have room to accommodate covariates. Accordingly we strive to construct and investigate tractable models which incorporate both individual effects and covariates. The study was motivated by plantations of fast growing tree species, and the climatic and genetic factors that influence stem radial growth of juvenile Eucalyptus hybrids grown on the east coast of South Africa. Measurement of stem radius was conducted using dendrometres on eighteen sampled trees of two Eucalyptus hybrid clones (E. grandis χ E.urophylla, GU and E.grandis χ E. Camaldulensis, GC). Information on climatic data (temperature, rainfall, solar radiation, relative humidity and wind speed) was simultaneously collected from the study site. We explored various functional statistical models which are able to handle the growth, individual traits, and covariates. These models include partial least squares approaches, principal component regression, path models, fractional polynomial models, nonlinear mixed models and additive mixed models. Each one of these models has strengths and weaknesses. Application of these models is carried out by analysing the stem radial growth data. The partial least squares and principal component regression methods were used to identify the most important predictor for stem radial growth. Path models approach was then applied mainly to find some indirect effects of climatic factors. We further explored the tree specific effects that are unique to a particular tree under study by fitting a fractional polynomial model in the context of linear mixed effects model. The fitted fractional polynomial model showed that the relationship between stem radius and tree age is nonlinear. The performance of fractional polynomial models was compared with that of nonlinear mixed effects models. Using nonlinear mixed effects models some growth parameters like inflection points were estimated. Moreover, the fractional polynomial model fit was almost as good as the nonlinear growth curves. Consequently, the fractional polynomial model fit was extended to include the effects of all climatic variables. Furthermore, the parametric methods do not allow the data to decide the most suitable form of the functions. In order to capture the main features of the longitudinal profiles in a more flexible way, a semiparametric approach was adopted. Specifically, the additive mixed models were used to model the effect of tree age as well as the effect of each climatic factor.Item Determinants of optimal adherence over time to antiretroviral therapy amongst HIV positive adults in South Africa : a longitudinal study.(Springer Verlag., 2011) Maqutu, Dikokole.; Zewotir, Temesgen Tenaw.; North, Delia Elizabeth.; Grobler, Anna Christina.Highly active antiretroviral therapy (HAART) requires strict adherence to achieve optimal clinical and survival benefits. A study was done to explore the factors affecting HAART adherence among HIV positive adults by reviewing routinely collected patient information in the Centre for the AIDS Programme of Research in South Africa’s (CAPRISA) AIDS Treatment Programme. Records of 688 patients enrolled between 2004 and 2006 were analysed. Patients were considered adherent if they had taken at least 95% of their prescribed drugs. Generalized estimating equations were used to analyse the data. The results showed that HAART adherence increased over time, however, the rate of increase differed by some of the socio-demographic and behavioural characteristics of the patients. For instance, HAART adherence increased in both urban and rural treatment sites over time, but the rate of increase was higher in the rural site. This helped identify sub-populations, such as the urban population, that required ongoing adherence counseling.Item Factors affecting first-month adherence to antiretroviral therapy among HIV-positive adults in South Africa.(Taylor & Francis co-published with NISC., 2010) Maqutu, Dikokole.; Zewotir, Temesgen Tenaw.; North, Delia Elizabeth.; Naidoo, Kogieleum.; Grobler, Anna Christina.This study explores the influence of baseline factors on first-month adherence to highly active antiretroviral therapy (HAART) among adults. The study design involved a review of routinely collected patient information in the CAPRISA AIDS Treatment (CAT) programme, at a rural and an urban clinic in KwaZulu-Natal Province, South Africa. The records of 688 patients enrolled in the CAT programme between June 2004 and September 2006 were analysed. Adherence was calculated from pharmacy records (pill counts) and patients were considered adherent if they had taken at least 95% of their prescribed drugs. Logistic regression was used to analyse the data and account for confounding factors. During the first month of therapy, 79% of the patients were adherent to HAART. HAART adherence was negatively associated with a higher baseline CD4 count. Women had better adherence if they attended voluntarily testing and counselling or if they had taken an HIV test because they were unwell, while men had higher adherence if they were tested due to perceived risk of HIV infection. HAART adherence was positively associated with higher age among patients who possessed cell phones and among patients who provided a source of income in the urban setting, but not in the rural setting. Though long-term data from this cohort is required to fully evaluate the impact of non-adherence in the first month of treatment, this study identifies specific groups of patients at higher risk for whom adherence counselling should be targeted and tailored. For example, first-month HAART adherence can be improved by targeting patients initiated on treatment with a high CD4 count.Item Flexible statistical modelling in food insecurity risk assessment.(2015) Lokosang, Laila Barnaba.; Ramroop, Shaun.; Zewotir, Temesgen Tenaw.Food insecurity has remained a persistent problem in Sub-Saharan Africa. Conflict and other protracted crisis have rendered a significant proportion of Africa’s populations to suffer the risk of food insecurity, as their resilience to livelihood shocks weakens. A significant and immense body of research in the past two decades has largely centred on describing the incidence of food insecurity and vulnerability. Limited research was done using statistical methods to determine the likelihood of food insecurity risk. The use of flexible statistical techniques for a sound and purposive monitoring, evaluation, planning and decision making in food security and resilience was limited. The study aimed to extend the use of statistics into the expanding field of food security and resilience, and also to provide new direction for future research involving applications of the methods explored, such as adjustments in statistical methods, sampling and data collection. The study specifically aims at helping food security analysts with tested and statistically robust tools for use in the analyses of the likelihood of food insecurity risk in settings with structural food insecurity issues. Moreover, it aimed to inform practice, policy and analysis in monitoring and evaluation of food insecurity risk in protracted crisis; thus helping in improving risk aversion measures. Utilising secondary data, the research examines relevant statistical techniques for determining predictors of food insecurity risk, namely; Principal Component Analysis; Multiple Correspondence Analysis; Classification and Regression Tree Analysis; Survey Logistic Regression, Generalized Linear Mixed Models for Ordered Categorical Data; and Joint Modelling. The study was conducted in the form of structured analysis of different datasets vi collected in the conflict-ridden South Sudan. Assets owned by households, as well as availability of livelihood endowments, was used as proxy for determining the level of resilience in particular demographic unit or geographical setting. The study highlighted the strengths and weaknesses of the techniques explored in the analysis as identifying or classifying potential predictors of food insecurity outcomes. Each technique is capable of generating a unique composite index for measuring the amount of resilience and predicting and classifying households according to food insecurity phase based on factor loadings. In general, the study determined that each method explored has peculiar strengths as well as limitations. However, a noteworthy implication observed is that asset-based statistical analysis, whether based on composite index that can be used as proxy for measuring the amount of resilience to food insecurity eventualities or on regression modelling approaches, does assure sufficient rigour in drawing conclusions about the wellbeing of households or populations under study and how they might withstand food insecurity and livelihood shocks. As food insecurity and malnutrition continue to attract substantial attention, such flexible analytical approaches exert potential usefulness in determining food insecurity risks, especially in protracted crisis settings.Item Imputation for nonresponse using the Annual Financial Statistics Survey.(2011) Singh, Smeeta.; North, Delia Elizabeth.; Zewotir, Temesgen Tenaw.In this dissertation, we focus on the Annual Financial Statistics (AFS) survey. This is a survey conducted by Statistics South Africa, the national statistics office of South Africa. The main purpose of this survey is to provide information for compiling of the GDP estimates, value-added and its components, which are used to monitor and develop government policies. The AFS covers a sample of private enterprises operating the formal non-agricultural business sector of the South African economy, excluding financial, insurance and government institutions. Quality is a key area of importance in this organisation and therefore methodology and standards need to be monitored, evaluated and reviewed on a regular basis. This would assist in ensuring that Statistics South Africa is following international best practices for collection and estimation of official statistics. We focus on nonresponse for the Annual Financial Statistics survey, and investigate an alternative method for adjusting for nonresponse and in particular focus on improving the method of dealing with nonresponse thereby improving the estimates from the AFS survey.Item Investigation of fertilizer usage in Malawi within the rural livelihood diversification project using generalized linear models and quantile regression.(2013) Kabuli, Hilda Janet Jinazali.; Zewotir, Temesgen Tenaw.; Ndlovu, Principal.Malawi’s economy relies heavily on agriculture which is threatened by declines in soil fertility. Measures to ensure increased crop productivity at household level include the increased use of inorganic fertilizers. To supplement the Government’s effort in ensuring food security, Rural Livelihood Diversification Project (RLDP) was implemented in Kasungu and Lilongwe Districts in Malawi. The RLDP Project was aimed at increasing accessibility and utilisation of inorganic fertilizers. We used the data collected by the International Center for Tropical Agriculture (CIAT), to investigate if there could be any significant impacts of the interventions carried out by the project. A general linear model was initially used to model the data. Terms in the model were selected using the automatic stepwise procedure in GLMSELECT procedure of SAS. Other models that were used included a transformed response general linear model, gamma model based on log link and its alternative inverse link, and quantile regression procedures were used in modelling the amount of fertilizer use per acre response given a set of fixed effect predictors where households were only sampled at baseline or impact assessment study. The general linear model failed to comply with the model assumption of normality and constant variance. The gamma model was affected by influential observations. Quantile regression model is robust to outliers and influential observations. Quantile regression provided that number of plots cultivated, timeline, household saving and irrigation interaction, and the interaction between plots and timeline significantly affected the amounts of fertilizers applied per acre amongst the 25% of the households who apply lower levels of fertilizer per acre.Item Longitudinal analysis of the effect of climatic factors on the wood anatomy of two eucalypt clones.(2010) Ayele, Dawit Getnet.; Zewotir, Temesgen Tenaw.; Ndlovu, Principal.Eucalypt trees are one of tree species used for the manufacturing of papers in South Africa. The manufacturing of paper consists of cooking the wood with chemicals until obtaining a pulp. The wood is made of different cells. The shape and structure of these cells, called wood anatomical characteristics are important for the quality of paper. In addition, the anatomical characteristics of wood are influenced by environmental factors like climatic factors, soil compositions etc…. In this study we investigated the effects of the climatic factors (temperature, rainfall, solar radiation, relative humidity, and wind speed) on wood anatomical characteristics of two Eucalyptus clones, a GC (Eucalyptus grandis × camuldulensis) and a GU (Eucalyptus grandis × urophylla). Nine trees per clone have been selected. Two sets of data have been collected for this study. The first set of data was eleven anatomical characteristics of the wood formed daily over a period of five years. The second set of data was the daily measurement of temperature, rainfall, solar radiation, relative humidity and wind speed in the experimental area. Wood is made of two kinds of cell, the fibres and the vessels. The fibres are used for the strength and support of the tree and the vessels for the nutrition. Eleven characteristics related to those cells have been measured (diameter, wall thickness, frequency). These characteristics are highly correlated. To reduce the number of response variables, the principal component analysis was used and the first four principal components accounts for about 95% of the total variation. Based on the weights associated with each component the first four principal components were labelled as vessel dimension (VD), fibre dimension (FD), fibre wall (FW) and vessel frequency (VF). The longitudinal linear mixed model with age, season, temperature, rainfall, solar radiation, relative humidity and wind speed as the fixed effects factors and tree as random effect factor was fitted to the data. From time series modelling result, lagged order of climatic variables were identified and these lagged climatic variables were included in the model. To account for the physical characteristic of the trees we included the effect of diameter at breast height, stem radius, daily radial increment, and the suppression or dominance of the tree in the model. It was found that wood anatomical characteristics of the two clones were more affected by climatic variables when the tree was on juvenile stage as compared to mature stage.Item Longitudinal clinical covariates influence on CD4+ cell count after seroconversion.(2019) Tinarwo, Partson.; Zewotir, Temesgen Tenaw.; North, Delia Elizabeth.The Acquired Immunodeficiency Syndrome (AIDS) pandemic is a global challenge. The human immunodeficiency virus (HIV) is notoriously known for weakening the immune system and opening channels for opportunistic infections. The Cluster of Difference 4 (CD4+) cells are mainly killed by the HIV and hence used as a health indicator for HIV infected patients. In the past, the CD4+ count diagnostics were very expensive and therefore beyond the reach of many in resource-limited settings. Accordingly, the CD4+ count’s clinical covariates were the potential diagnostic tools. From a different angle, it is essential to examine a trail of the clinical covariates effecting the CD4+ cell response. That is, inasmuch as the immune system regulates the CD4+ count fluctuations in reaction to the viral invasion, the body’s other complex functional systems are bound to adjust too. However, little is known about the corresponding adaptive behavioural patterns of the clinical covariates influence on the CD4+ cell count. The investigation in this study was carried out on data obtained from the Centre for the Programme of AIDS research in South Africa (CAPRISA), where initially, HIV negative patients were enrolled into different cohorts, for different objectives. These HIV negative patients were then followed up in their respective cohort studies. As soon as a patient seroconverted in any of the cohort studies, the patient was then enrolled again, into a new cohort of HIV positive patients only. The follow-up on the seroconvertants involved a simultaneous recording of repeated measurements of the CD4+ count and 46 clinical covariates. An extensive exploratory analysis was consequently performed with three variable reduction methods for high-dimensional longitudinal data to identify the strongest clinical covariates. The sparse partial least squares approach proved to be the most appropriate and a robust technique to adopt. It identified 18 strongest clinical covariates which were subsequently used to fit other sophisticated statistical models including the longitudinal multilevel models for assessing inter-individual variation in the CD4+ count due to each clinical covariate. Generalised additive mixed models were then used to gain insight into the CD4+ count trends and possible adaptive optimal set-points of the clinical covariates. To single out break-points in the change of linear relationships between the CD4+ count and the covariates, segmented regression models were employed. In getting to grips with the understanding of the highly complex and intertwined relationships between the CD4+ count, clinical covariates and the time lagged effects during the HIV disease progression, a Structural Equation Model (SEM) was constructed and fitted. The results showed that sodium consistently changed its effects at 132mEq/L and 140 mEq/L across all the post HIV infection phases. Generally, the covariate influence on the CD4+ count varied with infection phase and widely between individuals during the anti-retroviral therapy (ART). We conlude that there is evidence of covariate set-point adaptive behaviour to positively influence the CD4+ cell count during the HIV disease progression.Item Longitudinal survey data analysis.(2006) Nasirumbi, Pamela Opio.; Zewotir, Temesgen Tenaw.To investigate the effect of environmental pollution on the health of children in the Durban South Industrial Basin (DSIB) due to its proximity to industrial activities, 233 children from five primary schools were considered. Three of these schools were located in the south of Durban while the other two were in the northern residential areas that were closer to industrial activities. Data collected included the participants' demographic, health, occupational, social and economic characteristics. In addition, environmental information was monitored throughout the study specifically, measurements on the levels of some ambient air pollutants. The objective of this thesis is to investigate which of these factors had an effect on the lung function of the children. In order to achieve this objective, different sample survey data analysis techniques are investigated. This includes the design-based and model-based approaches. The nature of the survey data finally leads to the longitudinal mixed model approach. The multicolinearity between the pollutant variables leads to the fitting of two separate models: one with the peak counts as the independent pollutant measures and the other with the 8-hour maximum moving average as the independent pollutant variables. In the selection of the fixed-effects structure, a scatter-plot smoother known as the loess fit is applied to the response variable individual profile plots. The random effects and the residual effect are assumed to have different covariance structures. The unstructured (UN) covariance structure is used for the random effects, while using the Akaike information criterion (AIC), the compound symmetric (CS) covariance structure is selected to be appropriate for the residual effects. To check the model fit, the profiles of the fitted and observed values of the dependent variables are compared graphically. The data is also characterized by the problem of intermittent missingness. The type of missingness is investigated by applying a modified logistic regression model missing at random (MAR) test. The results indicate that school location, sex and weight are the significant factors for the children's respiratory conditions. More specifically, the children in schools located in the northern residential areas are found to have poor respiratory conditions as compared to those in the Durban-South schools. In addition, poor respiratory conditions are also identified for overweight children.Item The management of missing categorical data : comparison of multiple imputation and subset correspondence analysis.(2015) Hendry, Gillian Margaret.; Zewotir, Temesgen Tenaw.; Naidoo, Rajen.; North, Delia Elizabeth.Missing data is a common problem in research and the manner in which this ‘missingness’ is managed, is crucial to the validity of analysis outcomes. This study illustrates the use of two diverse methods to handle, in particular, missing categorical data. These methods are applied to a set of data which intended to identify relationships between asthma severity in children and environmental, behavioural, genetic and socio-economic factors. This dataset suffered from substantial missingness. The first method involved the application of two approaches to multiple imputation, each adopting different distributional specifications. A practical challenge, previously undocumented, was encountered in the application of multiple imputation when interactions, to be identified and included in the analysis model, were needed for the imputation model. This study found that by imputing a single set of complete data using the expectation maximization (EM) algorithm for covariance matrices, it was possible to identify relevant interactions for inclusion in the imputation model. The second method illustrated the application of correspondence analysis to a subset of the data that includes only the measured data categories. The application of subset correspondence analysis (s-CA) with incomplete data, as well as its sensitivity to the type of missingness, has not been well documented, if at all. There is also no evidence of research in which interactions have been added to an analysis with s-CA. In this study its use, both with and without interactions, was illustrated and the results, when compared to those from the multiple imputation approach, were found to be similar and favourably complementary. A simulation study found that s-CA performed well with any type of missingness, provided the amount of missingness is less than 30% on any variable with incomplete data. Across all analyses, relationships found between asthma severity and factors were consistent with known relationships, thus providing confirmation of the reliability of the methods.