Doctoral Degrees (Statistics)
Permanent URI for this collectionhttps://hdl.handle.net/10413/7126
Browse
Browsing Doctoral Degrees (Statistics) by SDG "SDG4"
Now showing 1 - 8 of 8
- Results Per Page
- Sort Options
Item Analysis of discrete time competing risks data with missing failure causes and cured subjects.(2023) Ndlovu, Bonginkosi Duncan.; Zewotir, Temesgen Tenaw.; Melesse, Sileshi Fanta.This thesis is motivated by the limitations of the existing discrete time competing risks models vis-a-vis the treatment of data that comes with missing failure causes or a sizableproportions of cured subjects. The discrete time models that have been suggested to date (Davis and Lawrance, 1989; Tutz and Schmid, 2016; Ambrogi et al., 2009; Lee et al., 2018) are cause-specific-hazard denominated. Clearly, this fact summarily disqualifies these models from consideration if data comes with missing failure causes. It is also a well documented fact that naive application of the cause-specific-hazards to data that has a sizable proportion of cured subjects may produce downward biased estimates for these quantities. The existing models can be considered within the multiple imputation framework (Rubin, 1987) for handling missing failure causes, but the prospects of scaling them up for handling cured subjects are minimal, if not nil. In this thesis we address these issues concerning the treatment of missing failure causes and cured subjects in discrete time settings. Towards that end, we focus on the mixture model (Larson and Dinse, 1985) and the vertical model (Nicolaie et al., 2010) because these models possess certain properties which dovetail with the objectives of this thesis. The mixture model has been upgraded into a model that can handle cured subjects. Nicolaie et al. (2015) have demonstrated that the vertical model can also handle missing failure causes as is. Nicolaie et al. (2018) have also extended the vertical model to deal with cured subjects. Our strategy in this thesis is to exploit both the mixture model and the vertical model as a launching pad to advance discrete time models for handling data that comes with missing failure causes or cured subjects.Item Bayesian spatio-temporal and joint modelling of malaria and anaemia among Nigerian children aged under five years, including estimation of the effects of risk factors.(2023) Ibeji, Jecinta Ugochukwu.; Mwambi, Henry Godwell.; Iddrisu, Abdul-Karim.Childhood mortality and morbidity in Nigeria have been linked to malaria and anaemia. This thesis focused on exploring the risk factors and the complexity of the relationship between malaria and anaemia in under 5 Nigerian children. Data from the 2010 and 2015 Nigeria Malaria Indicator Survey conducted by Demographic Health Survey were used. In 2010, the prevalence of malaria and anaemia was 48% and 72%, respectively, while in 2015, 27% and 68% were the respective prevalences of malaria and anaemia diseases. Machine learning-based exploratory classification methods were used to explain the relationship and patterns between the independent variables and the two dependent variables, namely malaria and anaemia. Decisions made by the public health body are centered on the administrative units (i.e., states) within the country. Therefore, the development of disease mapping and a brief overview of limiting assumptions and ways of tackling them was explained. Consequently, malaria and anaemia spatial variation for 2010 and 2015 was analyzed with the inclusion of their respective risk factors. A separate multivariate hierarchical Bayesian logistic model for each disease was adopted to investigate the spatial pattern of malaria and anaemia and adjust for the risk factors associated with each disease. Furthermore, a multilevel model analysis was applied to independently investigate the spatio-temporal distribution of malaria and anaemia. A joint model was further adopted to check for the relationship between malaria and anaemia and their common risk factors and relax the nonlinearity assumption. In the 2010 data, type of place of residence, mother’s highest educational level, source of drinking water, type of toilet facility, child’s sex, main floor material, and households that have electricity, radio, television, and water were significantly associated with malaria and anaemia. While in the 2015 data, the type of place of residence, source of drinking water, type of toilet facility, households with radio, main roof material, wealth index, child’s sex, and mother’s highest educational level had a significant relationship with malaria and anaemia. The results from this study can guide policymakers to tailor-make effective interventions to reduce or prevent malaria and anaemia diseases. This will help adequately distribute limited state health system resources, such as personnel, funds and facilities within the country.Item Discrete time-to-event construction for multiple recurrent state transitions.(2023) Batidzirai, Jesca Mercy.; Manda, Samuel.; Mwambi, Henry Godwell.Recent developments in multi-state models have considered discrete time rather than continuous time in the modeling of transition intensities, whose major drawback lies in the possibility of resulting in biased parameter estimates that arise from issues of handling ties. Discrete-time models have included univariate multilevel models to account for possible dependence among specific pairwise recurrent transitions within the same subject. However, in most cases, there would be several specific pairwise transitions of interest. In such cases, there is a need to model the transitions with the aim of identifying those transitions that are correlated. This provides insight into how the transitions are related to each other. In order to investigate the interdependencies between transitions, the unique contribution of this thesis is to propose a multivariate discrete-time multi-state model with multiple state transitions. In this model, each specific recurrent transition is associated with a random effect to capture possible dependence in the transitions of the same type or different types. The random effects themselves were then modeled by a multivariate normal distribution and model parameters were estimated using maximum likelihood methods with Gaussian quadratures numerical integration. A simulation study was done to evaluate the performance of the proposed model. The model yielded satisfactory results for most fixed effects and random effects estimates. This is noticed by near-zero biases and mean square errors of the average estimates as well as high 95% coverage probabilities of the 95% confidence intervals from 1000 replications. The proposed methodology was applied to marriage formation and dissolution data from KwaZulu-Natal province, South Africa. Five transitions were considered, namely: Never Married to Married, Married to Separated, Married to Widowed, Separated to Married and Widowed to Married. The presence of very small unobserved subject-to subject heterogeneity for each transition and a weak positive correlation between transitions were produced. Statistically, the model produced smaller standard errors compared to those from univariate models, hence it is more precise on estimates. The multivariate modeling of discrete time-to-event models provides a better understanding of the evolution of all transitions simultaneously, thus in addition to covariate effects, giving an assessment of how one transition is associated with the other. Empirical results confirmed well known important socio-demographic predictors of entering and exiting a marriage. Age at sexual debut played a positive critical role in most of the transitions. More educated subjects were associated with a lower likelihood of entering a first marriage, experiencing a marital dissolution as well as remarrying after widowhood. Subjects who had a sexual debut at younger ages were more likely to experience a marital dissolution than those who started late. Age at first marriage had a negative association with marital dissolution. We may, therefore, postulate that existing programs that encourage delay in onset of sexual activity for HIV risk reduction for example, may also have a positive impact on lowering rates of marital dissolution, thus ultimately improving psychological and physical health.Item Financial modelling of cryptocurrency: a case study of Bitcoin, Ethereum, and Dogecoin in comparison with JSE stock returns.(2022) Kaseke, Forbes.; Ramroop, Shaun.; Mwambi, Henry Godwell.The emergency of cryptocurrency has caused a shift in the financial markets. Although it was created as a currency for exchange, cryptocurrency has been shown to be an asset, with investors seeking to profit from it rather than using it as a medium of exchange. Despite being a financial asset, cryptocurrency has distinct, stylised facts like any other asset. Studying these stylised facts allows the creation of better-suited models to assist investors in making better data-driven decisions. The data used in this thesis was of three leading cryptocurrencies: Bitcoin, Ethereum, and Dogecoin and the Johannesburg Stock Exchange (JSE) data as a guide for comparison. The sample period was from 18 September 2017 to 27 May 2021. The goal was to research the stylised facts of cryptocurrencies and then create models that capture these stylised facts. The study developed risk-quantifying models for cryptocurrencies. The main findings were that cryptocurrency exhibits stylised facts that are well-known in financial data. However, the magnitude and frequency of these stylised facts tend to differ. For example, cryptocurrency is more volatile than stock returns. The volatility also tends to be more persistent than in stocks. The study also finds that cryptocurrency has a reverse leverage effect as opposed to the normal one, where past negative returns increase volatility more than past positive returns. The study also developed a hybrid GARCH model using the extreme value theorem for quantifying cryptocurrency risk. The results showed that the GJR-GARCH with GDP innovations could be used as an alternative model to calculate the VaR. The volatile nature of cryptocurrency was also compared with that of the JSE while accounting for structural breaks and while not accounting for them. The results showed that the cryptocurrencies’ volatility patterns are similar but differ from those of the JSE. The cryptocurrency was also found to be an inefficient market. This finding means that some investors can take advantage of this inefficiency. The study also revealed that structural breaks affect volatility persistence. However, this persistence measure differs depending on the model used. Markov switching GARCH models were used to strengthen the structural break findings. The results showed that two-regime models outperform single-regime models. The VAR and DCC-GARCH models were also used to test the spillovers amongst the assets used. The results showed short-run spillovers from Bitcoin to Ethereum and long-run spillovers based on the DCC-GARCH. Lastly, factors affecting cryptocurrency adoption were discussed. The main reasons affecting mass adoption are the complexity that comes with the use of cryptocurrency and its high volatility. This study was critical as it gives investors an understanding of the nature and behaviour of cryptocurrency so that they know when and how to invest. It also helps policymakers and financial institutions decide how to treat or use cryptocurrency within the economy.Item Flexible Bayesian hierarchical spatial modeling in disease mapping.(2022) Ayalew, Kassahun Abere.; Manda, Samuel.The Gaussian Intrinsic Conditional Autoregressive (ICAR) spatial model, which usually has two components, namely an ICAR for spatial smoothing and standard random effects for non-spatial heterogeneity, is used to estimate spatial distributions of disease risks. The normality assumption in this model may not always be correct and misspecification of the distribution of random effects could result in biased estimation of the spatial distribution of disease risk, which could lead to misleading conclusions and policy recommendations. Limited research studies have been done where the estimation of the spatial distributions of diseases under the ICAR-normal model were compared to those obtained from fitting ICAR-nonnormal model. The results from these studies indicated that the ICAR-nonnormal models performed better than the ICAR-normal in terms of accuracy, efficiency and predictive capacity. However, these efforts have not fully addressed the effect on the estimation of spatial distributions under flexible specification of ICAR models in disease mapping. The overall aim of this PhD thesis was to develop approaches that relax the normality assumption that is often used in modeling and fitting of ICAR models in the estimation of spatial patterns of diseases. In particular, the thesis considered the skewnormal and skew-Laplace distributions under the univariate, and skew-normal for the multivariate specifications to estimate the spatial distributions of either univariable or multivariable areal data. The thesis also considered non-parametric specification of the multivariate spatial effects in the ICAR model, which is a novel extension of an earlier work. The estimation of the models was done using Bayesian statistical approaches. The performances of our suggested alternatives to the ICAR-normal model were evaluated by simulating studies as well as with practical application to the estimation of district-level distribution of HIV prevalence and treatment coverage using health survey data in South Africa. Results from the simulation studies and analysis of real data demonstrated that our approaches performed better in the prediction of spatial distributions for univariable and multivariable areal data in disease mapping approaches. This PhD has shown the limitations of relying on the ICAR-normal model for the estimations of spatial distributions for all spatial analyses, even when the data could be asymmetric and non-normal. In such scenarios, skewed-ICAR and nonparametric ICAR approaches could provide better and unbiased estimation of the spatial pattern of diseases.Item Stable distributions with applications to South African financial data.(2024) Naradh , Kimera.; Chinhamu, Knowledge.; Chifurira, Retius.In recent times, researchers, analysts and statisticians have shown a keen interest in studying Extreme Value Theory (EVT), particularly with the application to mixture models in the medical and financial sectors. This study aims to validate the use of stable distributions in modelling three Johannesburg Stock Exchange (JSE) market indices, namely the All Share Index (ALSI), Banks Index and the Mining Index, as well as the United States of American Dollar (USD) to South African Rand (ZAR) exchange rate. This study leverages the unique properties of stable distributions when modelling heavy-tailed data. Nolan’s S0-parameterization stable distribution (SD) was fitted to the returns of the three FTSE/JSE indices and USD/ZAR exchange rate and a hybrid Generalized Autoregressive Conditional Heteroskedasticity (GARCH)-type model combined with stable distributions was fitted to each return series. The two-tailed mixture model of the Generalized Pareto Distribution (GPD), stable distribution, Generalized Pareto Distribution referred to as GSG, as well as the Stable-Normal-Stable (SNS) and Stable-KDE-Stable (SKS) was fitted to evaluate its relative performance in modelling financial data. Results show that the S0-parameterization SD fits the South African financial returns well. The hybrid GARCH (1,1)-SD model competes favourably with the GARCH-GPD model in estimating Value-at-Risk (VaR) for FTSE/JSE Banks Index, FTSE/JSE Mining Index and the USD/ZAR exchange rate returns. The hybrid EGARCH (1,1)-SD competes well against the GARCH-GPD model for the FTSE/JSE ALSI returns. Inconclusive results are observed for the short position of the fitted GKG models; however, in the long position, an appropriate fit of the GPD-KDE-GPD (GKG) model, where KDE is the kernel density estimator, is emphasised for all four return series. The proposed mixture models, GSG, SNS and SKS models, are found to be a good alternative in fitting South African financial data to the commonly used GPD-Normal-GPD (GNG) mixture model. The results of this study are important to financial practitioners, risk managers and researchers as the proposed mixture models add more value to the literature on the applications of extreme mixture models.Item Statistical and machine learning methods of online behaviours analysis.(2024) Soobramoney, Judah.; Chifurira, Retius.; Zewotir, Temesgen Tenaw.The success of corporates is highly influenced by the effectiveness and appeal of each corporate’s website. This study was conducted on TEKmation, a South African corporate, whose board of directors lacked insight regarding the website’s usage. The study aimed to quantify the web-traffic flow, detect the underlying browsing patterns, and validate the web-design effectiveness. The website experienced 7,935 visits and 57,154 page views from 1 June 2021 to 30 June 2023 (data sourced by Google Analytics). Grubb’s test has identified outliers in visit frequency, the pageviews per visit, and the visit duration per visit. A small degree of missingness was observed on the mobile device branding (1.24%) and operating system (0.03%) features which were imputed using a Bayesian network model. To address a data-shift detected, an artificial neural network (ANN) was proposed to flag future data-shifts with important predictors being the period of year and volume of sessions. Prior to clustering, feature selection methods assessed the feature variability and feature association. Results indicated that low-incidence webpages and features with natural relationships should be omitted. The K-means, DBScan and hierarchical unsupervised machine learning methods were employed to identify the visit personas, labelled get-in-touch (12%), accidentals (11%), dropoffs (30%), engrossed (38%) and seekers (9%). It was evident that the premature drop-offs needed further exploration. The Cox proportional hazards survival model and the random survival forest (RSF) model have identified that the web browser, visit frequency, device category, distance, certain webpages, volume of hits, and organic searches proved to be drop-offs hazards. A tiered Markov chain model was developed to compute the transition probabilities of dropping-off. The contact (63%) and clients (50%) states recorded a high likelihood to drop-off early within the visit. In conclusion, using statistical methods, the study informed the board on of its audience, the flaws of the website and proposed recommendations to address concerns.Item Statistical study on childhood malnutrition and anaemia in Angola, Malawi and Senegal.(2023) Khulu, Mthobisi Christian.; Ramroop, Shaun.; Habyarimana, Faustin.Malnutrition and anaemia continue to be a concern to the future of developing countries. This thesis aimed to examine the risk factors associated with malnutrition and anaemia among under five-year-old children in Angola, Malawi and Senegal. Statistical models and techniques have improved over the years to give more insight into malnutrition and anaemia, in terms of demographic, socio-economic, environmental, and geographic factors. This thesis also assessed the spatial epidemiological overlaps between childhood malnutrition and anaemia diseases which can lead to various advantages in intervention planning, monitoring, controlling and total elimination of such diseases, especially in high-risk regions. This is a secondary data analysis where national representative data from the three countries was used. The Demographic and Health Survey data from Angola, Malawi and Senegal were merged to create a pooled sample which was then used for all the analyses conducted in this study. The relationship between exploratory variables to malnutrition and anaemia was assessed to obtain variables that explain the two outcomes. Consequently, a generalized linear mixed model was used to investigate the significance of the child-level, community-level and household-level factors to malnutrition and anaemia separately. The relationship between the two diseases was further examined using the three joint modelling approaches: (1) a joint generalised linear mixed model; (2) a structural equation model, and (3) a bivariate copula geo-additive model. For each model employed, the significant factors of both malnutrition and anaemia were identified. The GLMM results on malnutrition revealed that children’s place of residence, age, gender, mother’s level of schooling, wealth status, birth interval and birth order significantly explain malnutrition at the 5% level of significance. Whereas, the GLMM results on anaemia revealed that children‘s age, gender, mother’s level of schooling, wealth status and nutritional status significantly explain anaemia at 5% level of significance. The findings of copula geo-additive modelling of malnutrition and anaemia indicated that there is an association between malnutrition and anaemia. There was a strong association observed between malnutrition and anaemia in the north-west districts of Angola when compared to other districts. The results imply that the policymakers of Angola, Senegal and Malawi can control anaemia through the intervention of malnutrition controlling. The overall findings of this study provide meaningful insight to the policymakers of Angola, Malawi and Senegal which will lead to the implementation of interventions that can assist in achieving the Sustainable Development Goal (SDG) of 25 deaths per 1 000 live births by 2030. To properly eradicate all the causes of malnutrition and anaemia, programs such as parental education, financial education, children's dietary focus programs and mobile health facilities could add a significant value. The results also highlighted the national priority areas related to child-related factors, household factors and environmental factors for childhood malnutrition and anaemia morbidity control. It also provided policy makers with valuable geographical information for developing and implementing effective intervention. There is a greater need for partnership and collaboration among the studied countries to achieve the SGD target.