Masters Degrees (Statistics)
Permanent URI for this collectionhttps://hdl.handle.net/10413/7127
Browse
Browsing Masters Degrees (Statistics) by Title
Now showing 1 - 20 of 107
- Results Per Page
- Sort Options
Item Age, period and cohort analysis of young adult mortality due to HIV and TB in South Africa: 1997-2015.(2019) Nkwenika, Tshifhiwa Mildred.; Mwambi, Henry Godwell.; Manda, Samuel.Young adult mortality is very important in South Africa with the impact of Human Immunodeficiency Virus /Acquired Immune deficiency Syndrome (HIV/AIDS), Tuberculosis (TB), injuries and emerging non-communicable diseases (NCDs). Investigation of temporal trends for adult mortality associated with TB and HIV has often based on age, gender, period and birth cohort separately. The overall aim of this study was to estimate age effect across period and birth cohort; period effect across age and birth cohort; and birth cohort effect across age and period on TB and HIV-related mortality. Mortality data and mid population estimates were obtained from Statistics South Africa for the period 1997 to 2015. Observed HIV/AIDS deaths were adjusted for under-reporting while adjustments for the misclassification of AIDS deaths and the proportion of ill-defined natural causes were made. Three-year age, period and birth cohort intervals for 15-64 years, 1997-2015 and 1934-2000 respectively were used. Age-Period-Cohort (APC) analysis using the Poisson distribution was used to compute effects of age, period and cohort on mortality due to TB and HIV.A total of 5, 825,502 adult deaths from the period 1997 to 2015, of which 910,731 (15.6%) were TB deaths while 252,101 (4.3%) were HIV deaths. A concave down association between TB mortality and period was observed while an upward trend was observed for HIV-related mortality. The estimated TB relative mortality showed a concave down association with age, a peak at 36-38 years was found. There was a concave down relationship between TB relative risk between 1997 and 2015. Findings showed a general downward trend between TB mortality and birth cohort, which 1934 cohort had higher rates of mortality. There was an inverse flatter U-shaped association between age and HIV-related mortality, where 30-32 years was more pronounced. An inverse U-shaped relationship between HIV-related mortality and period from 1997 to 2015 was estimated. An inverted V-shape relationship between birth cohort and HIV-related mortality was estimated. The study has found an inverse U-shaped association between TB-related mortality and age, period and general downward trend with birth cohort for deaths reported between 1997 and 2015.A concave down relationship between HIV-related mortality and age, period and inverted V-shaped with birth cohort was found. The association between HIV-related mortality and period differs from the officially reported trend with adjustment, which shows an upward progression. Our findings are based on a slight advanced statistical model using Age-Period-Cohort. Using APC analysis, we found a secular trend in TB and HIV-related mortality rates which could contribute certain clues in long-term planning, monitoring and evaluation.Item Analysis of a binary response : an application to entrepreneurship success in South Sudan.(2012) Lugga, James Lemi John Stephen.; Zewotir, Temesgen Tenaw.Just over half (50:6%) of the population of South Sudan lives on less than one US Dollar a day. Three quarters of the population live below the poverty line (NBS, Poverty Report, 2010). Generally, effective government policy to reduce unemployment and eradicate poverty focuses on stimulating new businesses. Micro and small enterprises (MSEs) are the major source of employment and income for many in under-developed countries. The objective of this study is to identify factors that determine business success and failure in South Sudan. To achieve this objective, generalized linear models, survey logistic models, the generalized linear mixed models and multiple correspondence analysis are used. The data used in this study is generated from the business survey conducted in 2010. The response variable, which is defined as business success or failure was measured by profit and loss in businesses. Fourteen explanatory variables were identified as factors contributing to business success and failure. A main effect model consisting of the fourteen explanatory variables and three interaction effects were fitted to the data. In order to account for the complexity of the survey design, survey logistic and generalized linear mixed models are refitted to the same variables in the main effect model. To confirm the results from the model we used multiple correspondence analysis.Item Analysis of time-to-event data including frailty modeling.(2006) Phipson, Belinda.; Mwambi, Henry Godwell.There are several methods of analysing time-to-event data. These include nonparametric approaches such as Kaplan-Meier estimation and parametric approaches such as regression modeling. Parametric regression modeling involves specifying the distribution of the survival time of the individuals, which are commonly chosen to be either exponential, Weibull, log- normal, log-logistic or gamma distributed. Another well known model that does not require assumptions about the hazard function to be made is the Cox proportional hazards model. However, there may be deviations from proportional hazards which may be explained by unaccounted random heterogeneity. In the early 1980s, a series of studies showed concern with the possible bias in the estimated treatment e®ect when important covariates are omitted. Other problems may be encountered with the traditional proportional hazards model when there is a possibility of correlated data, for instance when there is clustering. A method of handling these types of problems is by making use of frailty modeling. Frailty modeling is a method whereby a random e®ect is incorporated in the Cox pro- portional hazards model. While this concept is fairly simple to understand, the method of estimation of the ¯xed and random e®ects becomes complicated. Various methods have been explored by several authors, including the Expectation-Maximisation (EM) algorithm, pe- nalized partial likelihood approach, Markov Chain Monte Carlo (MCMC) methods, Monte Carlo EM approach and di®erent methods using Laplace approximation. The lack of available software is problematic for ¯tting frailty models. These models are usually computationally extensive and may have long processing times. However, frailty modeling is an important aspect to consider, particularly if the Cox proportional hazards model does not adequately describe the distribution of survival time.Item The application of classification techniques in modelling credit risk.(2014) Mushava, Jonah.; Murray, Michael.The aim of this dissertation is to examine the use of classification techniques to model credit risk through a practice known as credit scoring. In particular, the focus is on one parametric class of classification techniques and one non-parametric class of classification techniques. Since the goal of credit-scoring is to improve the quality of the decisions in evaluating a loan application, advanced and interesting methods that improve upon the performance of linear discriminant analysis (LDA) and classification and regression trees (CART) will be explored. For LDA these methods include a description of quadratic discriminant analysis (QDA), flexible discriminant analysis (FDA) and mixture discriminant analysis (MDA). Multivariate adaptive regression splines (MARS) are used in the FDA procedure. An Expectation Maximization (EM)-algorithm that estimates the model parameters in MDA will be developed thereof. Techniques that help to improve the performance of CART such as bagging, random forests and boosting are also discussed at length. A real life dataset was used as an illustration to how these credit-scoring models can be used to classify a new applicant. The dataset shall be split into a ‘learning sample’ and a ‘testing sample’. The learning sample will be used to develop the credit-scoring model (also known as a scorecard) whilst the testing sample will be used to test the predictive capability of the scorecard that would have been constructed. The predictive performance of the scorecards will be assessed using four measures; a classification error rate, a sensitivity measure, a specificity measure and the area under the ROC curve (AUC). Based on these four model performance measures, the empirical results reveal that there is no single ideal scorecard for modelling credit risk because such a conclusion depends on the aims and objectives of the lender, the details of the problem and the data structure.Item The application of multistate Markov models to HIV disease progression.(2011) Reddy, Tarylee.; Mwambi, Henry Godwell.Survival analysis is a well developed area which explores time to single event analysis. In some cases, however, such methods may not adequately capture the disease process as the disease progression may involve intermediate events of interest. Multistate models incorporate multiple events or states. This thesis proposes to demystify the theory of multistate models through an application based approach. We present the key components of multistate models, relevant derivations, model diagnostics and techniques for modeling the effect of covariates on transition intensities. The methods that are developed in the thesis are applied to HIV and TB data partly sourced from CAPRISA and the HPP programmes in the University of KwaZulu-Natal. HIV progression is investigated through the application of a five state Markov model with reversible transitions such that state 1: CD4 count 500, state 2: 350 CD4 count < 500, state 3: 200 CD4 count < 350, state 4: CD4 count < 200 and state 5: ARV initiation. The mean sojourn time in each state and transition probabilities are presented as well as the effect of covariates namely age, gender and baseline CD4 count on transition rates. A key finding, consistent with previous research, is that the rate of decline in CD4 count tends to decrease at lower levels of the marker. Further, patients enrolling with a CD4 count less than 350 had a far lower chance of immune recovery and a substantially higher chance of immune deterioration compared to patients with a higher CD4 count. We noted that older patients tend to progress more rapidly through the disease than younger patients.Item An application of some inventory control techniques.(1992) Samuels, Carol Anne.; Moolman, W. H.; Ryan, K. C.No abstract available.Item Application of statistical multivariate techniques to wood quality data.(2010) Negash, Asnake Worku.; Mwambi, Henry Godwell.; Zewotir, Temesgen Tenaw.Sappi is one of the leading producer and supplier of Eucalyptus pulp to the world market. It is also a great contributor to South Africa economy in terms of employment opportunity to the rural people through its large plantation and export earnings. Pulp mills production of quality wood pulp is mainly affected by the supply of non uniform raw material namely Eucalyptus tree supply from various plantations. Improvement in quality of the pulp depends directly on the improvement on the quality of the raw materials. Knowing factors which affect the pulp quality is important for tree breeders. Thus, the main objective of this research is first to determine which of the anatomical, chemical and pulp properties of wood are significant factors that affect pulp properties namely viscosity, brightness and yield. Secondly the study will also investigate the effect of the difference in plantation location and site quality, trees age and species type difference on viscosity, brightness and yield of wood pulp. In order to meet the above mentioned objectives, data for this research was obtained from Sappi’s P186 trial and other two published reports from the Council for Scientific and Industrial Research (CSIR). Principal component analysis, cluster analysis, multiple regression analysis and multivariate linear regression analysis were used. These statistical analysis methods were used to carry out mean comparison of pulp quality measurements based on viscosity, brightness and yield of trees of different age, location, site quality and hybrid type and the results indicate that these four factors (age, location, site quality and hybrid type) and some anatomical and chemical measurements (fibre lumen diameter, kappa number, total hemicelluloses and total lignin) have significant effect on pulp quality measurements.Item Application of survival analysis methods to study under-five child mortality in Uganda.(2013) Nasejje, Justine.; Achia, Thomas Noel Ochieng.; Mwambi, Henry G.Infant and child mortality rates are one of the health indicators in a given community or country. It is the fourth millennium development goal that by 2015, all the united nations member countries are expected to have reduced their infant and child mortality rates by two-thirds. Uganda is one of those countries in sub-Saharan Africa with high infant and child mortality rates and therefore has the need to find out the factors strongly associated to these high rates in order to provide alternative or maintain the existing interventions. The Uganda Demographic Health Survey (UDHS) funded by USAID, UNFPA, UNICEF, Irish Aid and the United kingdom government provides a data set which is rich in information. This information has attracted many researchers and some of it can be used to help Uganda monitor her infant and child mortality rates to achieve the fourth millennium goal. Survival analysis techniques and frailty modelling is a well developed statistical tool in analysing time to event data. These methods were adopted in this thesis to examine factors affecting under-five child mortality in Uganda using the UDHS data for 2011 using R and STATA software. Results obtained by fitting the Cox-proportional hazard model and frailty models and drawing inference using both the Frequentists and Bayesian approach showed that, Demographic factors (sex of the household head, sex of the child and number of births in the past one year) are strongly associated with high under-five child mortality rates. Heterogeneity or unobserved covariates were found to be signifcant at household but insignifcant at community level.Item Applications of Levy processes in finance.(2009) Essay, Ahmed Rashid.; O’Hara, J. G.The option pricing theory set forth by Black and Scholes assumes that the underlying asset can be modeled by Geometric Brownian motion, with the Brownian motion being the driving force of uncertainty. Recent empirical studies, Dotsis, Psychoyios & Skiadopolous (2007) [17], suggest that the use of Brownian motion alone is insufficient in accurately describing the evolution of the underlying asset. A more realistic description of the underlying asset’s dynamics would be to include random jumps in addition to that of the Brownian motion. The concept of including jumps in the asset price model leads us naturally to the concept of a L'evy process. L'evy processes serve as a building block for stochastic processes that include jumps in addition to Brownian motion. In this dissertation we first examine the structure and nature of an arbitrary L'evy process. We then introduce the stochastic integral for L'evy processes as well as the extended version of Itˆo’s lemma, we then identify exponential L'evy processes that can serve as Radon-Nikod'ym derivatives in defining new probability measures. Equipped with our knowledge of L'evy processes we then implement this process in a financial context with the L'evy process serving as driving source of uncertainty in some stock price model. In particular we look at jump-diffusion models such as Merton’s(1976) [37] jump-diffusion model and the jump-diffusion model proposed by Kou and Wang (2004) [30]. As the L'evy processes we consider have more than one source of randomness we are faced with the difficulty of pricing options in an incomplete market. The options that we shall consider shall be mainly European in nature, where exercise can only occur at maturity. In addition to the vanilla calls and puts we independently derive a closed form solution for an exchange option under Merton’s jump-diffusion model making use of conditioning arguments and stochastic integral representations. We also examine some exotic options under the Kou and Wang model such as barrier options and lookback options where the solution to the option price is derived in terms of Laplace transforms. We then develop the Kou and Wang model to include only positive jumps, under this revised model we compute the value of a perpetual put option along with the optimal exercise point. Keywords Derivative pricing, L'evy processes, exchange options, stochastic integration.Item Aspects of categorical data analysis.(1998) Govender, Yogarani.; Matthews, Glenda Beverley.The purpose of this study is to investigate and understand data which are grouped into categories. At the onset, the study presents a review of early research contributions and controversies surrounding categorical data analysis. The concept of sparseness in a contingency table refers to a table where many cells have small frequencies. Previous research findings showed that incorrect results were obtained in the analysis of sparse tables. Hence, attention is focussed on the effect of sparseness on modelling and analysis of categorical data in this dissertation. Cressie and Read (1984) suggested a versatile alternative, the power divergence statistic, to statistics proposed in the past. This study includes a detailed discussion of the power-divergence goodness-of-fit statistic with areas of interest covering a review on the minimum power divergence estimation method and evaluation of model fit. The effects of sparseness are also investigated for the power-divergence statistic. Comparative reviews on the accuracy, efficiency and performance of the power-divergence family of statistics under large and small sample cases are presented. Statistical applications on the power-divergence statistic have been conducted in SAS (Statistical Analysis Software). Further findings on the effect of small expected frequencies on accuracy of the X2 test are presented from the studies of Tate and Hyer (1973) and Lawal and Upton (1976). Other goodness-of-fit statistics which bear relevance to the sparse multino-mial case are discussed. They include Zelterman's (1987) D2 goodness-of-fit statistic, Simonoff's (1982, 1983) goodness-of-fit statistics as well as Koehler and Larntz's tests for log-linear models. On addressing contradictions for the sparse sample case under asymptotic conditions and an increase in sample size, discussions are provided on Simonoff's use of nonparametric techniques to find the variances as well as his adoption of the jackknife and bootstrap technique.Item Bayesian data augmentation using MCMC: application to missing values imputation on cancer medication data.(2017) Ndlela, Thamsanqa Innocent.; Lougue, Siaka.Missing data is a very serious issue that negatively affect inferences and findings of researchers in data science and statistics. The ignorance of missing data or deletion of cases that contain missing observations may lead to reducing statistical power, loss of information, increasing standard errors of estimates and increases estimation bias in data analysis. One of the advantages of using imputation methods is to keep the full sample size, which makes the results to be more precise. Amongst all the missing data imputation techniques, data augmentation is not so popular in the literature and very few articles mentioned the use of the technique to account for missing data problems. Data Augmentation technique can be used for imputation of missing data in both Bayesian and classical statistics. In the classical approach, data augmentation is implemented through EM algorithm that uses maximum likelihood function to impute and estimate unknown parameters of a model. EM algorithm is a useful tool for a likelihood-based decision when dealing with missing data problems. The Bayesian data augmentation approach is used when it is not possible to directly estimate a posterior distribution P( j xov), of the parameters, given the observed data xov due to the missing data in x. This study aims to contribute to a better understanding of Bayesian data augmentation and improve the quality of estimates and precision of the analysis of data with missing values. The General Household Survey [GHS 2015] is the main source of data in this study. All the analyses are made using the software R and more precisely the package mix. In this study, we have find that Bayesian data augmentation can solve the problem of missing data in cancer drug intake data. The Bayesian data augmentation performs very well in improving modelling of cancer drug affected by missing data.Item Bayesian hierarchical spatial and spatio-temporal modeling and mapping of tuberculosis in Kenya.(2013) Iddrisu, Abdul-Karim.; Mwambi, Henry G.; Ochia, Thomas Noel Ochieng.Global spread of infectious disease threatens the well-being of human, domestic, and wildlife health. A proper understanding of global distribution of these diseases is an important part of disease management and policy making. However, data are subject to complexities by heterogeneity across host classes and space-time epidemic processes [Waller et al., 1997, Hosseini et al., 2006]. The use of frequentist methods in Biostatistics and epidemiology are common and are therefore extensively utilized in answering varied research questions. In this thesis, we proposed the Hierarchical Bayesian approach to study the spatial and the spatio-temporal pattern of tuberculosis in Kenya [Knorr-Held et al., 1998, Knorr-Held, 1999, L opez-Qu lez and Munoz, 2009, Waller et al., 1997, Julian Besag, 1991]. Space and time interaction of risk (ψ[ij]) is an important factor considered in this thesis. The Markov Chain Monte Carlo (MCMC) method via WinBUGS and R packages were used for simulations [Ntzoufras, 2011, Congdon, 2010, David et al., 1995, Gimenez et al., 2009, Brian, 2003], and the Deviance Information Criterion (DIC), proposed by [Spiegelhalter et al., 2002], used for models comparison and selection. Variation in TB risk is observed among Kenya counties and clustering among counties with high TB relative risk (RR). HIV prevalence is identified as the dominant determinant of TB. We found clustering and heterogeneity of risk among high rate counties and the overall TB risk is slightly decreasing from 2002-2009. Interaction of TB relative risk in space and time is found to be increasing among rural counties that share boundaries with urban counties with high TB risk. This is as a result of the ability of models to borrow strength from neighbouring counties, such that near by counties have similar risk. Although the approaches are less than ideal, we hope that our formulations provide a useful stepping stone in the development of spatial and spatio-temporal methodology for the statistical analysis of risk from TB in Kenya.Item Classification of banking clients according to their loan default status using machine learning algorithms.(2022) Reddy, Suveshnee.; Chifurira, Retius.; Zewotir, Temesgen Tenaw.Loan lending has become crucial for both individuals and companies. For lending institutions, although profitable, it can be very risky due to clients defaulting on their loan agreement. Credit risk assessment is a critical process which is carried out by most lending institutions; it reduces the possibility of lending to clients who will default on their loan repayment, however, it does not eliminate the problem. Thus, a collections process which aims to retrieve unpaid debt is also necessary. With South Africa facing another recession, which was only worsened by the lockdown during the covid-19 pandemic, lending institutions can expect an increase in the number loan defaulters. To counter this increase, changes will have to be made to their policies and processes. Changes can be made to either the loan application procedures (e.g. credit risk assessment, affordability assessment et cetera) or the post disbursal procedures (e.g. collections processes). The aim of this study is to predict whether a client will default on his/her loan, using machine learning algorithms, in order to enhance the collection process of the financial institution under study, where default is defined as missing at least three payments in the first 12 months of the loan being granted. The logistic regression model, decision tree, random forest, support vector machine, Naïve Bayes classifier, k-nearest neighbours algorithm and the artificial neural network were fitted to the balanced dataset. In the researcher’s analysis, loan data from a South African financial institution were used for the period August 2019 to December 2019. Variables related to a client’s demographics, income, expenses and debt, as well as loan information, were included in the dataset. Exploratory data analysis (EDA) was utilised in order to analyse the dataset and summarise their main characteristics. To reduce the dimensionality of the dataset, two techniques were used, namely principal component analysis (PCA), which is also used to correct the data for multicollinearity, and feature selection (i.e., recursive feature elimination). Each model was fitted to the dataset using these two techniques, and the confusion matrix and metrics such balanced accuracy, true positive ratio, true negative ratio, AUC score and the Gini coefficient were used to evaluate the different models in order to determine which model performed the best and was most suited for this application problem. The results show that when using the PCA approach, the random forest model, which obtained a balanced accuracy score, true positive ratio and AUC score of 0.69, 0.74 and 0.74, respectively, performed the best. The random forest model also performed the best when using the feature selection technique, obtaining a balanced accuracy score, true positive ratio and AUC score of 0.69, 0.74 and 0.75, respectively. When comparing the random forest model using PCA to the random forest model using feature selection, the results showed a marginal difference between each performance metric analysed. The random forest model using PCA utilised 48 variables, whereas the random forest model using feature selection utilised only 18 variables and thus seemed to be more suitable for the classification problem under study. The results of this study are expected to benefit analysts and data scientists in financial institutions who would like to identify the robust machine learning algorithms for classifying defaulting clients. This study is also of significance to policy makers who would want to identify the risk factors associated with loan defaulting clients.Item A comparison of cancer classification methods based on microarray data.(2018) Mohammed, Mohanad Mohammed Adam.; Mwambi, Henry Godwell.; Omolo, Bernard.Cancer is among the leading causes of death in both developed and developing countries. Through gene expression profiling of tumors, the accuracy of cancer classification has been enhanced, leading to correct diagnoses and the application of effective therapies. Here, we discuss a comparative review of the binary class predictive ability of seven classification methods (support vector machines, with the radial basis kernel (SVM(RK)), linear kernel (SVM(LK)) and the polynomial kernel (SVM(PK)), artificial neural networks (ANN), random forests (RF), k-nearest neighbor (KNN), and naive Bayes (NB)), using publicly-available gene expression data from cancer research. Results indicate that NB outperformed the other methods in terms of the accuracy, sensitivity, specificity, kappa coefficient, area under the curve (AUC), and balanced error rate (BER) of the binary classifier. Thus, overall the Naive Bayes (NB) approach turned out to be the best classifier with our datasets.Item Completion of an incomplete market by quadratic variation assets.(2011) Mgobhozi, Sivuyile Wiseman.; Mataramvura, Sure.It is well known that the general geometric L´evy market models are incomplete, except for the geometric Brownian and the geometric Poissonian, but such a market can be completed by enlarging it with power-jump assets as Corcuera and Nualart [12] did on their paper. With the knowledge that an incomplete market due to jumps can be completed, we look at other cases of incompleteness. We will consider incompleteness due to more sources of randomness than tradable assets, transactions costs and stochastic volatility. We will show that such markets are incomplete and propose a way to complete them. By doing this we show that such markets can be completed. In the case of incompleteness due to more randomness than tradable assets, we will enlarge the market using the market’s underlying quadratic variation assets. By doing this we show that the market can be completed. Looking at a market paying transactional costs, which is also an incomplete market model due to indifference between the buyers and sellers price, we will show that a market paying transactional costs as the one given by, Cvitanic and Karatzas [13] can be completed. Empirical findings have shown that the Black and Scholes assumption of constant volatility is inaccurate (see Tompkins [40] for empirical evidence). Volatility is in some sense stochastic, and is divided into two broad classes. The first class being single-factor models, which have only one source of randomness, and are complete markets models. The other class being the multi-factor models in which other random elements are introduced, hence are an incomplete markets models. In this project we look at some commonly used multi-factor models and attempt to complete one of them.Item A complex survey data analysis of TB and HIV mortality in South Africa.(2012) Murorunkwere, Joie Lea.; Thomas, Achia.; Mwambi, Henry G.Many countries in the world record annual summary statistics such as economic indicators like Gross Domestic Product (GDP) and vital statistics for example the number of births and deaths. In this thesis we focus on mortality data from various causes including Tuberculosis (TB) and HIV. TB is an infectious disease caused by bacteria called Mycobacterium tuberculosis. It is the main cause of death in the world among all infectious diseases. An additional complexity is that HIV/AIDS acts as a catalyst to the occurrence of TB. Vaidyanathan and Singh revealed that people infected with mycobacterium tuberculosis alone have an approximately 10% life time risk of developing active TB, compared to 60% or more in persons co-infected with HIV and mycobacterium tuberculosis. South Africa was ranked seventh highest by the World Health Organization among the 22 TB high burden countries in the world and fourth highest in Africa. The research work in this thesis uses the 2007 Statistics South Africa (STATSSA) data on TB and HIV as the primary cause of death to build statistical models that can be used to investigate factors associated with death due to TB. Logistic regression, Survey Logistic regression and generalized linear models (GLM) will be used to assess the effect of risk factors or predictors to the probability of deaths associated with TB and HIV. This study will be guided by a theoretical approach to understanding factors associated with TB and HIV deaths. Bayesian modeling using WINBUGS will be used to assess spatial modeling of relative risk and spatial prior distributions for disease mapping models. Of the 615312 deceased, 546917 (89%) died from natural death, 14179 (2%) were stillborn and 54216 (9%) from non-natural death possibly accidents, murder, suicide. Among those who died from natural death and disease, 65052 (12%) died of TB and 13718 (2%) died of HIV. The results of the analysis revealed risk factors associated with TB and HIV mortality.Item Concepts for the construction of confidence intervals for measuring stability after hallux vulgus surgery: theoretical development and application.(2021) Ogutu, Sarah Atieno.; Mwambi, Henry Godwell.; Ziegler, Andreas.The absolute change in the corrected angle measured immediately after surgery and after bone healing is a clinically relevant endpoint to judge an osteotomy's stability. The primary objective of this research is to illustrate the non-inferiority of a novel screw used for fixation of the osteotomy compared with a standard screw. If the difference in the angles after surgery and after bone healing can be assumed to be normally distributed, the absolute change follows the folded normal distribution. The most natural approach to present the clinical study results is using a confidence interval to compare two folded normal distributions. We construct a confidence interval to compare two independent folded normal distributions using the ratio of two chi-square random variables, the difference of two chi-square distribution, and the bootstrap method. We illustrate the approaches from a study on hallux valgus osteotomy. The proposed confidence intervals permit an investigation of the noninferiority for the two treatment groups in clinical trials with end points following a folded normal distribution. The application to real data results indicates that the confidence interval for the ratio of two chi-squares random variable and bootstrap is straightforward and easy to calculate. Bootstrapping was asymptotically more accurate than the standard interval obtained from samples that assume normality. Also, it was an appropriate way to ascertain the stability of the results. Judging by δ of the bootstrap method, we establish non-inferiority for the new surgical method. In conclusion, the approaches are promising, and we recommend them for use to compare other practical data that require the use of the folded normal distribution.Item Count data modelling application.(2019) Ibeji, Jecinta Ugochukwu.; North, Delia Elizabeth.; Zewotir, Temesgen Tenaw.The rapid increase of total children ever born without a proportionate growth in the Nigerian economy has been a concern and making prediction with count data requires applying appropriate regression model.. As count data assumes discrete, non-negative values, a Poisson distribution is the ideal distribution to describe this data, but it is deficient due to equality of variance and mean. This deficiency results in under/over-dispersion and the estimation of the standard errors will be biased rendering the test statistics incorrect. This study aimed to model count data with the application of total children ever born using a Negative Binomial and Generalized Poisson regression The Nigeria Demographic and Health Survey 2013 data of women within the age of 15-49 years were used and three models applied to investigate the factors affecting the number of children ever born. A predictive count modelling was also carried out based on the performance evaluation metrics (root mean square error, mean absolute error, R-squared and mean square error). In the inferential modeling, Generalized Poisson Model was found to be superior with age of household head (𝑃<.0001), age of respondent at the time of first birth (𝑃<.0001), urban-rural status (𝑃<.0001), and religion (𝑃<.0001) being significantly associated with total children ever born. In the predictive modeling, all the three models showed almost identical performance evaluation metrics but Poisson regression was chosen as the best because it is the simplest model. In conclusion, early marriage, religious belief and unawareness of women who dwell in rural areas should be checked to control total children ever born in Nigeria.Item Estimating risk determinants of HIV and TB in South Africa.(2009) Mzolo, Thembile.; Mwambi, Henry G.; Zuma, Khangelani.Where HIV/AIDS has had its greatest adverse impact is on TB. People with TB that are infected with HIV are at increased risk of dying from TB than HIV. TB is the leading cause of death in HIV individuals in South Africa. HIV is the driving factor that increases the risk of progression from latent TB to active TB. In South Africa no coherent analysis of the risk determinants of HIV and TB has been done at the national level this study seeks to mend that gab. This study is about estimating risk determinants of HIV and TB. This will be done using the national household survey conducted by Human Sciences Research Council in 2005. Since individuals from the same household and enumerator area more likely to be more alike in terms of risk of disease or correlated among each other, the GEEs will be used to correct for this potential intraclass correlation. Disease occurrence and distribution is highly heterogeneous at the population, household and the individual level. In recognition of this fact we propose to model this heterogeneity at community level through GLMMs and Bayesian hierarchical modelling approaches with enumerator area indicating the community e ect. The results showed that HIV is driven by sex, age, race, education, health and condom use at sexual debut. Factors associated with TB are HIV status, sex, education, income and health. Factors that are common to both diseases are sex, education and health. The results showed that ignoring the intraclass correlation can results to biased estimates. Inference drawn from GLMMs and Bayesian approach provides some degree of con dence in the results. The positive correlation found at an enumerator area level for both HIV and TB indicates that interventions should be aimed at an area level rather than at the individual level.Item Estimating the force of infection from prevalence data : infectious disease modelling.(2013) Balakrishna, Yusentha.; Mwambi, Henry G.By knowing the incidence of an infectious disease, we can ascertain the high risk factors of the disease as well as the e ectiveness of awareness programmes and treatment strategies. Since the work of Hugo Muench in 1934, many methods of estimating the force of infection have been developed, each with their own advantages and disadvantages. The objective of this thesis is to explore the di erent compartmental models of infectious diseases and establish and interpret the parameters associated with them. Seven models formulated to estimate the force of infection were discussed and applied to data obtained from CAPRISA. The data was agespeci c HIV prevalence data based on antenatal clinic attendees from the Vulindlela district in KwaZulu-Natal. The link between the survivor function, the prevalence and the force of infection was demonstrated and generalized linear model methodology was used i to estimate the force of infection. Parametric and nonparametric force of infection models were used to t the models to data from 2009 to 2010. The best tting model was determined and thereafter applied to data from 2002 to 2010. The occurring trends of HIV incidence and prevalence were then evaluated. It should be noted that the sample size for the year 2002 was considerably smaller than that of the following years. This resulted in slightly inaccurate estimates for the year 2002. Despite the general increase in HIV prevalence (from 54.07% in 2003 to 61.33% in 2010), the rate of new HIV infections was found to be decreasing. The results also showed that the age at which the force of infection peaked for each year increased from 16.5 years in 2003 to 18 years in 2010. Farrington's two parameter model for estimating the force of HIV infection was shown to be the most useful. The results obtained emphasised the importance of HIV awareness campaigns being targeted at the 15 to 19 year old age group. The results also suggest that using only prevalence as a measure of disease can be misleading and should rather be used in conjunction with incidence estimates to determine the success of intervention and control strategies.