Repository logo
 

Doctoral Degrees (Statistics)

Permanent URI for this collectionhttps://hdl.handle.net/10413/7126

Browse

Recent Submissions

Now showing 1 - 20 of 51
  • Item
    Statistical and machine learning methods of online behaviours analysis.
    (2024) Soobramoney, Judah.; Chifurira, Retius.; Zewotir, Temesgen Tenaw.
    The success of corporates is highly influenced by the effectiveness and appeal of each corporate’s website. This study was conducted on TEKmation, a South African corporate, whose board of directors lacked insight regarding the website’s usage. The study aimed to quantify the web-traffic flow, detect the underlying browsing patterns, and validate the web-design effectiveness. The website experienced 7,935 visits and 57,154 page views from 1 June 2021 to 30 June 2023 (data sourced by Google Analytics). Grubb’s test has identified outliers in visit frequency, the pageviews per visit, and the visit duration per visit. A small degree of missingness was observed on the mobile device branding (1.24%) and operating system (0.03%) features which were imputed using a Bayesian network model. To address a data-shift detected, an artificial neural network (ANN) was proposed to flag future data-shifts with important predictors being the period of year and volume of sessions. Prior to clustering, feature selection methods assessed the feature variability and feature association. Results indicated that low-incidence webpages and features with natural relationships should be omitted. The K-means, DBScan and hierarchical unsupervised machine learning methods were employed to identify the visit personas, labelled get-in-touch (12%), accidentals (11%), dropoffs (30%), engrossed (38%) and seekers (9%). It was evident that the premature drop-offs needed further exploration. The Cox proportional hazards survival model and the random survival forest (RSF) model have identified that the web browser, visit frequency, device category, distance, certain webpages, volume of hits, and organic searches proved to be drop-offs hazards. A tiered Markov chain model was developed to compute the transition probabilities of dropping-off. The contact (63%) and clients (50%) states recorded a high likelihood to drop-off early within the visit. In conclusion, using statistical methods, the study informed the board on of its audience, the flaws of the website and proposed recommendations to address concerns.
  • Item
    Stable distributions with applications to South African financial data.
    (2024) Naradh , Kimera.; Chinhamu, Knowledge.; Chifurira, Retius.
    In recent times, researchers, analysts and statisticians have shown a keen interest in studying Extreme Value Theory (EVT), particularly with the application to mixture models in the medical and financial sectors. This study aims to validate the use of stable distributions in modelling three Johannesburg Stock Exchange (JSE) market indices, namely the All Share Index (ALSI), Banks Index and the Mining Index, as well as the United States of American Dollar (USD) to South African Rand (ZAR) exchange rate. This study leverages the unique properties of stable distributions when modelling heavy-tailed data. Nolan’s S0-parameterization stable distribution (SD) was fitted to the returns of the three FTSE/JSE indices and USD/ZAR exchange rate and a hybrid Generalized Autoregressive Conditional Heteroskedasticity (GARCH)-type model combined with stable distributions was fitted to each return series. The two-tailed mixture model of the Generalized Pareto Distribution (GPD), stable distribution, Generalized Pareto Distribution referred to as GSG, as well as the Stable-Normal-Stable (SNS) and Stable-KDE-Stable (SKS) was fitted to evaluate its relative performance in modelling financial data. Results show that the S0-parameterization SD fits the South African financial returns well. The hybrid GARCH (1,1)-SD model competes favourably with the GARCH-GPD model in estimating Value-at-Risk (VaR) for FTSE/JSE Banks Index, FTSE/JSE Mining Index and the USD/ZAR exchange rate returns. The hybrid EGARCH (1,1)-SD competes well against the GARCH-GPD model for the FTSE/JSE ALSI returns. Inconclusive results are observed for the short position of the fitted GKG models; however, in the long position, an appropriate fit of the GPD-KDE-GPD (GKG) model, where KDE is the kernel density estimator, is emphasised for all four return series. The proposed mixture models, GSG, SNS and SKS models, are found to be a good alternative in fitting South African financial data to the commonly used GPD-Normal-GPD (GNG) mixture model. The results of this study are important to financial practitioners, risk managers and researchers as the proposed mixture models add more value to the literature on the applications of extreme mixture models.
  • Item
    Analysis of discrete time competing risks data with missing failure causes and cured subjects.
    (2023) Ndlovu, Bonginkosi Duncan.; Zewotir, Temesgen Tenaw.; Melesse, Sileshi Fanta.
    This thesis is motivated by the limitations of the existing discrete time competing risks models vis-a-vis the treatment of data that comes with missing failure causes or a sizableproportions of cured subjects. The discrete time models that have been suggested to date (Davis and Lawrance, 1989; Tutz and Schmid, 2016; Ambrogi et al., 2009; Lee et al., 2018) are cause-specific-hazard denominated. Clearly, this fact summarily disqualifies these models from consideration if data comes with missing failure causes. It is also a well documented fact that naive application of the cause-specific-hazards to data that has a sizable proportion of cured subjects may produce downward biased estimates for these quantities. The existing models can be considered within the multiple imputation framework (Rubin, 1987) for handling missing failure causes, but the prospects of scaling them up for handling cured subjects are minimal, if not nil. In this thesis we address these issues concerning the treatment of missing failure causes and cured subjects in discrete time settings. Towards that end, we focus on the mixture model (Larson and Dinse, 1985) and the vertical model (Nicolaie et al., 2010) because these models possess certain properties which dovetail with the objectives of this thesis. The mixture model has been upgraded into a model that can handle cured subjects. Nicolaie et al. (2015) have demonstrated that the vertical model can also handle missing failure causes as is. Nicolaie et al. (2018) have also extended the vertical model to deal with cured subjects. Our strategy in this thesis is to exploit both the mixture model and the vertical model as a launching pad to advance discrete time models for handling data that comes with missing failure causes or cured subjects.
  • Item
    Discrete time-to-event construction for multiple recurrent state transitions.
    (2023) Batidzirai, Jesca Mercy.; Manda, Samuel.; Mwambi, Henry Godwell.
    Recent developments in multi-state models have considered discrete time rather than continuous time in the modeling of transition intensities, whose major drawback lies in the possibility of resulting in biased parameter estimates that arise from issues of handling ties. Discrete-time models have included univariate multilevel models to account for possible dependence among specific pairwise recurrent transitions within the same subject. However, in most cases, there would be several specific pairwise transitions of interest. In such cases, there is a need to model the transitions with the aim of identifying those transitions that are correlated. This provides insight into how the transitions are related to each other. In order to investigate the interdependencies between transitions, the unique contribution of this thesis is to propose a multivariate discrete-time multi-state model with multiple state transitions. In this model, each specific recurrent transition is associated with a random effect to capture possible dependence in the transitions of the same type or different types. The random effects themselves were then modeled by a multivariate normal distribution and model parameters were estimated using maximum likelihood methods with Gaussian quadratures numerical integration. A simulation study was done to evaluate the performance of the proposed model. The model yielded satisfactory results for most fixed effects and random effects estimates. This is noticed by near-zero biases and mean square errors of the average estimates as well as high 95% coverage probabilities of the 95% confidence intervals from 1000 replications. The proposed methodology was applied to marriage formation and dissolution data from KwaZulu-Natal province, South Africa. Five transitions were considered, namely: Never Married to Married, Married to Separated, Married to Widowed, Separated to Married and Widowed to Married. The presence of very small unobserved subject-to subject heterogeneity for each transition and a weak positive correlation between transitions were produced. Statistically, the model produced smaller standard errors compared to those from univariate models, hence it is more precise on estimates. The multivariate modeling of discrete time-to-event models provides a better understanding of the evolution of all transitions simultaneously, thus in addition to covariate effects, giving an assessment of how one transition is associated with the other. Empirical results confirmed well known important socio-demographic predictors of entering and exiting a marriage. Age at sexual debut played a positive critical role in most of the transitions. More educated subjects were associated with a lower likelihood of entering a first marriage, experiencing a marital dissolution as well as remarrying after widowhood. Subjects who had a sexual debut at younger ages were more likely to experience a marital dissolution than those who started late. Age at first marriage had a negative association with marital dissolution. We may, therefore, postulate that existing programs that encourage delay in onset of sexual activity for HIV risk reduction for example, may also have a positive impact on lowering rates of marital dissolution, thus ultimately improving psychological and physical health.
  • Item
    Bayesian spatio-temporal and joint modelling of malaria and anaemia among Nigerian children aged under five years, including estimation of the effects of risk factors.
    (2023) Ibeji, Jecinta Ugochukwu.; Mwambi, Henry Godwell.; Iddrisu, Abdul-Karim.
    Childhood mortality and morbidity in Nigeria have been linked to malaria and anaemia. This thesis focused on exploring the risk factors and the complexity of the relationship between malaria and anaemia in under 5 Nigerian children. Data from the 2010 and 2015 Nigeria Malaria Indicator Survey conducted by Demographic Health Survey were used. In 2010, the prevalence of malaria and anaemia was 48% and 72%, respectively, while in 2015, 27% and 68% were the respective prevalences of malaria and anaemia diseases. Machine learning-based exploratory classification methods were used to explain the relationship and patterns between the independent variables and the two dependent variables, namely malaria and anaemia. Decisions made by the public health body are centered on the administrative units (i.e., states) within the country. Therefore, the development of disease mapping and a brief overview of limiting assumptions and ways of tackling them was explained. Consequently, malaria and anaemia spatial variation for 2010 and 2015 was analyzed with the inclusion of their respective risk factors. A separate multivariate hierarchical Bayesian logistic model for each disease was adopted to investigate the spatial pattern of malaria and anaemia and adjust for the risk factors associated with each disease. Furthermore, a multilevel model analysis was applied to independently investigate the spatio-temporal distribution of malaria and anaemia. A joint model was further adopted to check for the relationship between malaria and anaemia and their common risk factors and relax the nonlinearity assumption. In the 2010 data, type of place of residence, mother’s highest educational level, source of drinking water, type of toilet facility, child’s sex, main floor material, and households that have electricity, radio, television, and water were significantly associated with malaria and anaemia. While in the 2015 data, the type of place of residence, source of drinking water, type of toilet facility, households with radio, main roof material, wealth index, child’s sex, and mother’s highest educational level had a significant relationship with malaria and anaemia. The results from this study can guide policymakers to tailor-make effective interventions to reduce or prevent malaria and anaemia diseases. This will help adequately distribute limited state health system resources, such as personnel, funds and facilities within the country.
  • Item
    Statistical study on childhood malnutrition and anaemia in Angola, Malawi and Senegal.
    (2023) Khulu, Mthobisi Christian.; Ramroop, Shaun.; Habyarimana, Faustin.
    Malnutrition and anaemia continue to be a concern to the future of developing countries. This thesis aimed to examine the risk factors associated with malnutrition and anaemia among under five-year-old children in Angola, Malawi and Senegal. Statistical models and techniques have improved over the years to give more insight into malnutrition and anaemia, in terms of demographic, socio-economic, environmental, and geographic factors. This thesis also assessed the spatial epidemiological overlaps between childhood malnutrition and anaemia diseases which can lead to various advantages in intervention planning, monitoring, controlling and total elimination of such diseases, especially in high-risk regions. This is a secondary data analysis where national representative data from the three countries was used. The Demographic and Health Survey data from Angola, Malawi and Senegal were merged to create a pooled sample which was then used for all the analyses conducted in this study. The relationship between exploratory variables to malnutrition and anaemia was assessed to obtain variables that explain the two outcomes. Consequently, a generalized linear mixed model was used to investigate the significance of the child-level, community-level and household-level factors to malnutrition and anaemia separately. The relationship between the two diseases was further examined using the three joint modelling approaches: (1) a joint generalised linear mixed model; (2) a structural equation model, and (3) a bivariate copula geo-additive model. For each model employed, the significant factors of both malnutrition and anaemia were identified. The GLMM results on malnutrition revealed that children’s place of residence, age, gender, mother’s level of schooling, wealth status, birth interval and birth order significantly explain malnutrition at the 5% level of significance. Whereas, the GLMM results on anaemia revealed that children‘s age, gender, mother’s level of schooling, wealth status and nutritional status significantly explain anaemia at 5% level of significance. The findings of copula geo-additive modelling of malnutrition and anaemia indicated that there is an association between malnutrition and anaemia. There was a strong association observed between malnutrition and anaemia in the north-west districts of Angola when compared to other districts. The results imply that the policymakers of Angola, Senegal and Malawi can control anaemia through the intervention of malnutrition controlling. The overall findings of this study provide meaningful insight to the policymakers of Angola, Malawi and Senegal which will lead to the implementation of interventions that can assist in achieving the Sustainable Development Goal (SDG) of 25 deaths per 1 000 live births by 2030. To properly eradicate all the causes of malnutrition and anaemia, programs such as parental education, financial education, children's dietary focus programs and mobile health facilities could add a significant value. The results also highlighted the national priority areas related to child-related factors, household factors and environmental factors for childhood malnutrition and anaemia morbidity control. It also provided policy makers with valuable geographical information for developing and implementing effective intervention. There is a greater need for partnership and collaboration among the studied countries to achieve the SGD target.
  • Item
    Flexible Bayesian hierarchical spatial modelling in disease mapping.
    (2022) Ayalew, Kassahun Abere.; Manda, Samuel.
    The Gaussian Intrinsic Conditional Autoregressive (ICAR) spatial model, which usually has two components, namely an ICAR for spatial smoothing and standard random effects for non-spatial heterogeneity, is used to estimate spatial distributions of disease risks. The normality assumption in this model may not always be correct and misspecification of the distribution of random effects could result in biased estimation of the spatial distribution of disease risk, which could lead to misleading conclusions and policy recommendations. Limited research studies have been done where the estimation of the spatial distributions of diseases under the ICAR-normal model were compared to those obtained from fitting ICAR-nonnormal model. The results from these studies indicated that the ICAR-nonnormal models performed better than the ICAR-normal in terms of accuracy, efficiency and predictive capacity. However, these efforts have not fully addressed the effect on the estimation of spatial distributions under flexible specification of ICAR models in disease mapping. The overall aim of this PhD thesis is to develop approaches that relax the normality assumption that is often used in modeling and fitting of ICAR models in the estimation of spatial patterns of diseases. In particular, the thesis considers the skew-normal and skew-Laplace distributions under the univariate, and skew-normal for the multivariate specifications to estimate the spatial distributions of either univariable or multivariable areal data. The thesis also considers non-parametric specification of the multivariate spatial effects in the ICAR model, which is a novel extension of an earlier work. The estimation of the models was done using Bayesian statistical approaches. The performances of our suggested alternatives to the ICAR-normal model are evaluated by simulating studies as well as with practical application to the estimation of district-level distribution of HIV prevalence and treatment coverage using health survey data in South Africa. Results from the simulation studies and analysis of real data demonstrated that our approaches performed better in the prediction of spatial distributions for univariable and multivariable areal data in disease mapping approaches. This PhD work shows the limitations of relying on the ICAR-normal model for the estimations of spatial distributions for all spatial analyses, even when the data could be asymmetric and non-normal. In such scenarios, skewed-ICAR and nonparametric ICAR approaches could provide better and unbiased estimation of the spatial pattern of diseases.
  • Item
    Financial modelling of cryptocurrency: a case study of Bitcoin, Ethereum, and Dogecoin in comparison with JSE stock returns.
    (2022) Kaseke, Forbes.; Ramroop, Shaun.; Mwambi, Henry Godwell.
    The emergency of cryptocurrency has caused a shift in the financial markets. Although it was created as a currency for exchange, cryptocurrency has been shown to be an asset, with investors seeking to profit from it rather than using it as a medium of exchange. Despite being a financial asset, cryptocurrency has distinct, stylised facts like any other asset. Studying these stylised facts allows the creation of better-suited models to assist investors in making better data-driven decisions. The data used in this thesis was of three leading cryptocurrencies: Bitcoin, Ethereum, and Dogecoin and the Johannesburg Stock Exchange (JSE) data as a guide for comparison. The sample period was from 18 September 2017 to 27 May 2021. The goal was to research the stylised facts of cryptocurrencies and then create models that capture these stylised facts. The study developed risk-quantifying models for cryptocurrencies. The main findings were that cryptocurrency exhibits stylised facts that are well-known in financial data. However, the magnitude and frequency of these stylised facts tend to differ. For example, cryptocurrency is more volatile than stock returns. The volatility also tends to be more persistent than in stocks. The study also finds that cryptocurrency has a reverse leverage effect as opposed to the normal one, where past negative returns increase volatility more than past positive returns. The study also developed a hybrid GARCH model using the extreme value theorem for quantifying cryptocurrency risk. The results showed that the GJR-GARCH with GDP innovations could be used as an alternative model to calculate the VaR. The volatile nature of cryptocurrency was also compared with that of the JSE while accounting for structural breaks and while not accounting for them. The results showed that the cryptocurrencies’ volatility patterns are similar but differ from those of the JSE. The cryptocurrency was also found to be an inefficient market. This finding means that some investors can take advantage of this inefficiency. The study also revealed that structural breaks affect volatility persistence. However, this persistence measure differs depending on the model used. Markov switching GARCH models were used to strengthen the structural break findings. The results showed that two-regime models outperform single-regime models. The VAR and DCC-GARCH models were also used to test the spillovers amongst the assets used. The results showed short-run spillovers from Bitcoin to Ethereum and long-run spillovers based on the DCC-GARCH. Lastly, factors affecting cryptocurrency adoption were discussed. The main reasons affecting mass adoption are the complexity that comes with the use of cryptocurrency and its high volatility. This study was critical as it gives investors an understanding of the nature and behaviour of cryptocurrency so that they know when and how to invest. It also helps policymakers and financial institutions decide how to treat or use cryptocurrency within the economy.
  • Item
    Flexible Bayesian hierarchical spatial modeling in disease mapping.
    (2022) Ayalew, Kassahun Abere.; Manda, Samuel.
    The Gaussian Intrinsic Conditional Autoregressive (ICAR) spatial model, which usually has two components, namely an ICAR for spatial smoothing and standard random effects for non-spatial heterogeneity, is used to estimate spatial distributions of disease risks. The normality assumption in this model may not always be correct and misspecification of the distribution of random effects could result in biased estimation of the spatial distribution of disease risk, which could lead to misleading conclusions and policy recommendations. Limited research studies have been done where the estimation of the spatial distributions of diseases under the ICAR-normal model were compared to those obtained from fitting ICAR-nonnormal model. The results from these studies indicated that the ICAR-nonnormal models performed better than the ICAR-normal in terms of accuracy, efficiency and predictive capacity. However, these efforts have not fully addressed the effect on the estimation of spatial distributions under flexible specification of ICAR models in disease mapping. The overall aim of this PhD thesis was to develop approaches that relax the normality assumption that is often used in modeling and fitting of ICAR models in the estimation of spatial patterns of diseases. In particular, the thesis considered the skewnormal and skew-Laplace distributions under the univariate, and skew-normal for the multivariate specifications to estimate the spatial distributions of either univariable or multivariable areal data. The thesis also considered non-parametric specification of the multivariate spatial effects in the ICAR model, which is a novel extension of an earlier work. The estimation of the models was done using Bayesian statistical approaches. The performances of our suggested alternatives to the ICAR-normal model were evaluated by simulating studies as well as with practical application to the estimation of district-level distribution of HIV prevalence and treatment coverage using health survey data in South Africa. Results from the simulation studies and analysis of real data demonstrated that our approaches performed better in the prediction of spatial distributions for univariable and multivariable areal data in disease mapping approaches. This PhD has shown the limitations of relying on the ICAR-normal model for the estimations of spatial distributions for all spatial analyses, even when the data could be asymmetric and non-normal. In such scenarios, skewed-ICAR and nonparametric ICAR approaches could provide better and unbiased estimation of the spatial pattern of diseases.
  • Item
    Statistical modelling on childhood anaemia, malaria and stunting in Malawi, Lesotho, and Burundi.
    (2023) Gaston, Rugiranka Tony.; Ramroop, Shaun.; Habyarimana, Faustin.
    The current research aimed to produce and expand statistical models in the discipline of biostatistics with a focus on childhood anaemia, malaria, and stunting. Malaria, anaemia, and stunting together continue to be public health issues worldwide in both industrialised and underdeveloped countries, particularly in children younger than 5 years (Osazuwa and Ayo, 2010; Kanchana et al., 2018). Malaria, anaemia, and stunting are dangerous, mostly in children from underdeveloped nations and they still remain the biggest contributor to morbidity and mortality. In addition, anaemia, malaria, and stunting are associated, and if not treated on time can damage children’s emotional, physical, mental status and poor performance at school (Gaston et al., 2022). The current study evaluates the link between anaemia, stunting, and malaria simultaneously. Furthermore, the study assessed whether socioeconomic, geographical, environmental, and child demographic variables have a significant effect on childhood malaria, anaemia, and stunting. This study used a national secondary cross-sectional data from Malawi Malaria Indicator Survey (MMIS); Lesotho Demographic Health Survey (LDHS); and Burundi Demographic Health Survey (BDHS). The data was collected based on multi-stage sampling, stratified, and cluster sampling with an unequal chance of sampling. It is for this reason we first used the survey logistic regression model in Chapter 3, which accounted for the complexity of sampling design and heterogeneity between observations from the same cluster. However, this model includes only the fixed effect and does not have the option of adding the random effect to model the correlation between observations. We extend the model in Chapter 4, to a generalised mixed additive model (GAMM) to include the random effect. The GAMM is also an extension of the generalised linear mixed model (GLMM) and enables the parametric fixed effects from GLMM to be modelled as a non-parametric model using the additive smooth function. These models were applied to single response variables, and we wanted to evaluate the relationship which might exist between anaemia, stunting, and malaria. We then explore the multivariate joint model under GLMM in Chapter 5 to simultaneously joint either malaria and anaemia or anaemia and stunting. Finally, we introduce a structural equation model (SEM) in Chapter 6, to evaluate the complex interrelationships between socioeconomics, demographics, and environmental factors, as well as their direct or indirect relationship with childhood malaria, anaemia and stunting co-morbidity. The previous chapters could not address these interrelationships among the variables of interest. Each model used in this study has its weaknesses and strengths which can depend on the goal of the xii researcher. However, the multivariate model under GLMM and the structural equation model were found to be more adaptive and attractive to researchers interested in innovative scientific research. The findings from this study revealed that the child’s nutrition status, age, the child with fever, diarrhoea, altitude, place of residence, toilet facility, access to electricity, children who slept under a mosquito bed net the night before the survey, mother's education level, and mother’s body mass index have a significant effect on both childhood anaemia and malaria. The age of a child, the mother’s educational status, place of residence, wealth index, and child weight at birth were the determinants of stunting or malnutrition. The findings also indicated that the geographical, geophysical, environmental, household and child demographic factors were statistically significant and have either a direct or an indirect effect on childhood co-morbidity factors. The geographical factors were statistically significant and had a positive direct effect on childhood malaria, anaemia, and stunting. The estimated indirect path for the impact of geophysical factors on childhood co-morbidity factors, as mediated by household factors was statistically significant and positive. However, the estimated indirect paths for the effect of geophysical factors on childhood co-morbidity factors, as mediated by environmental factors were statistically significant but negative. The child demographic factors revealed a direct statistically significant impact on childhood co-morbidity factors. Furthermore, the estimated indirect path effect on childhood comorbidity as mediated effect on household factors was statistically significant and negative. Moreover, household and environmental factors indicate a positive direct effect on childhood co-morbidity anaemia, malaria, and stunting. Finally, the results of this study revealed a positive relationship between stunting, anaemia, and malaria. This means that malaria, anaemia, and stunting increase or decrease in the same direction. Hence, controlling one or two between malaria, anaemia, and stunting can reduce the effect of other(s), which can assist the policymakers and government in the allocation of financial resources to fight against childhood comorbidity anaemia, malaria, and stunting. Furthermore, understanding the link between anaemia, malaria, and stunting other factors associated with them will assist in focusing on those areas and go a long way toward achieving the United Nations Sustainable Development Goals (SDGs3), known as the complete elimination of under-5 mortality by 2030.
  • Item
    Appraising South African residential property and measuring price developments.
    (2022) Bax, Dane Gregory.; Zewotir, Temesgen Tenaw.; North, Delia Elizabeth.
    Housing wealth is well established as one of the most important sources of wealth for households and investors. However, owning a home is a fundamental human need, making monitoring residential property prices a social endeavour as well as an economic one, especially under times of economic uncertainty. Residential property prices also have a direct effect on the macroeconomy because of how they influence wealth effects where increased consumption by households is experienced through gains in households balance sheets due to increased equity. Collecting correct and adequate data is vitally important in analysing property market movements and developments, particularly given globalization, and the interlinked nature of financial markets. Although measuring residential property price developments is an important economic and social activity, matching properties over time is extremely difficult because the sale of homes is typically infrequent, characteristics vary, and homes are uniquely located in space. This thesis focuses on appraising several residential property types located throughout South Africa from January 2013 to August 2017, investigating different modelling approaches with the aim of developing a residential property price index. Various methods exist to create residential property price indices, however, hedonic models have proven useful as a quality adjusted approach where pure price changes are measured and not simply changes in the composition of samples over time. Before fitting any models to appraise homes, an autoencoder was built to detect anomalous data, due to human error at the data entry stage. The autoencoder identified improbable data resulting in a final data set of 415 200 records, once duplicate records were identified and removed. This study first investigated generalised linear models as a candidate approach to appraise homes in South Africa which showed possible alternatives to the ubiquitous log linear model. Relaxing functional form assumptions and considering the nested locational structure of homes, hierarchical generalised linear models were considered as the next candidate method. Partitioning around the mediods was applied to find additional spatial groupings which were treated as random effects along with the suburb. The findings showed that the marginal utility of structural attributes was non-linear and smooth functions of covariates were an appropriate treatment. Furthermore, the use of random effects helped account for the spatial heterogeneity of homes through partial pooling. Finally, machine learning algorithms were investigated because of minimal assumptions about the data generating process and the possibility of complex non-linear and interaction effects. Random forests, gradient boosted machines and neural networks were adopted to fit these appraisal functions. The gradient boosted machines had the best goodness of fit, showing non-linear relationships between the structural characteristics of homes and listing prices. Partial dependence plots were able to quantify the marginal utility over the distributions of different structural characteristics. The results show that larger sized homes do not necessarily yield a premium and a diminished return is evident, similar to the results of the hierarchical generalised additive models. The variable importance plots showed that location was the most important predictor followed by the number of bathrooms and the size of a home. The gradient boosted machines achieved the lowest out of sample error and were used to develop the residential property price index. A chained, dual imputation Fisher index was applied to the gradient boosted machines showing nominal and real price developments at a country and provincial level. The chained, dual imputation Fisher index provided less noisy estimates than a simple median mix adjusted index. Although listing prices were used and not transacted prices, the trend was similar to the ABSA Global Property Guide. In order to make this research useful to property market participants, a web application was developed to show how the proposed methodology can be democratised by property portals and real estate agencies. The Listing Price Index Calculator was created to easily communicate the results through a front-end interface, showing how property portals and real estate agencies can leverage their data to aid sellers in determining listing prices to go to market with, help buyers obtain an average estimate of the home they wish to purchase and guide property market participants on price developments.
  • Item
    Longitudinal clinical covariates influence on CD4+ cell count after seroconversion.
    (2019) Tinarwo, Partson.; Zewotir, Temesgen Tenaw.; North, Delia Elizabeth.
    The Acquired Immunodeficiency Syndrome (AIDS) pandemic is a global challenge. The human immunodeficiency virus (HIV) is notoriously known for weakening the immune system and opening channels for opportunistic infections. The Cluster of Difference 4 (CD4+) cells are mainly killed by the HIV and hence used as a health indicator for HIV infected patients. In the past, the CD4+ count diagnostics were very expensive and therefore beyond the reach of many in resource-limited settings. Accordingly, the CD4+ count’s clinical covariates were the potential diagnostic tools. From a different angle, it is essential to examine a trail of the clinical covariates effecting the CD4+ cell response. That is, inasmuch as the immune system regulates the CD4+ count fluctuations in reaction to the viral invasion, the body’s other complex functional systems are bound to adjust too. However, little is known about the corresponding adaptive behavioural patterns of the clinical covariates influence on the CD4+ cell count. The investigation in this study was carried out on data obtained from the Centre for the Programme of AIDS research in South Africa (CAPRISA), where initially, HIV negative patients were enrolled into different cohorts, for different objectives. These HIV negative patients were then followed up in their respective cohort studies. As soon as a patient seroconverted in any of the cohort studies, the patient was then enrolled again, into a new cohort of HIV positive patients only. The follow-up on the seroconvertants involved a simultaneous recording of repeated measurements of the CD4+ count and 46 clinical covariates. An extensive exploratory analysis was consequently performed with three variable reduction methods for high-dimensional longitudinal data to identify the strongest clinical covariates. The sparse partial least squares approach proved to be the most appropriate and a robust technique to adopt. It identified 18 strongest clinical covariates which were subsequently used to fit other sophisticated statistical models including the longitudinal multilevel models for assessing inter-individual variation in the CD4+ count due to each clinical covariate. Generalised additive mixed models were then used to gain insight into the CD4+ count trends and possible adaptive optimal set-points of the clinical covariates. To single out break-points in the change of linear relationships between the CD4+ count and the covariates, segmented regression models were employed. In getting to grips with the understanding of the highly complex and intertwined relationships between the CD4+ count, clinical covariates and the time lagged effects during the HIV disease progression, a Structural Equation Model (SEM) was constructed and fitted. The results showed that sodium consistently changed its effects at 132mEq/L and 140 mEq/L across all the post HIV infection phases. Generally, the covariate influence on the CD4+ count varied with infection phase and widely between individuals during the anti-retroviral therapy (ART). We conlude that there is evidence of covariate set-point adaptive behaviour to positively influence the CD4+ cell count during the HIV disease progression.
  • Item
    Statistical methods for causal inference in observational studies.
    (2020) Amusa, Lateef Babatunde.; Zewotir, Temesgen Tenaw.; North, Delia Elizabeth.
    Estimating causal effects is essential in the evaluation of a treatment or intervention. It is particularly straightforward for well-designed experiments. However, when the treatment assignment is complicated by confounders, as in the case of observational studies, such inferences regarding the treatment effects, require more sophisticated adjustments. In this thesis, we investigated different matching techniques in terms of how well they balance the treatment groups on the covariates, as well as their efficiency in estimating treatment effects. We considered the various algorithm variants of these matching techniques, which include propensity score matching, Mahalanobis distance matching, and coarsened exact matching. Secondly, we proposed two new strategies for estimating treatment effects, namely, covariatebalancing rank-based Mahalanobis distance (CBRMD) and an improved version of CBRMD (iCBRMD).We evaluated their performance via simulations and some reallife datasets. Thirdly, we investigated a relatively new optimization-based alternative, known as entropy balancing, which has been used rarely in the applied biomedical literature. We shared our experiences learned from using entropy balancing in non-experimental studies, via Monte Carlo simulations and an empirical application. We further extended the evaluation of entropy balancing to some standard measures of causal treatment effects, namely; difference in means, odds ratios, rate ratios and hazard ratios. We pulled together our evaluations by conducting Monte Carlo simulations, evaluating both well-established methods and the more recently proposed methods. These adjustment techniques were evaluated under different scenarios that align with the practical reality. Finally, we utilized a dataset from a recently conducted HIV Incidence Provincial Surveillance System (HIPSS) study, to apply the considered techniques to a public health issue in South Africa.
  • Item
    Statistical Modeling of Acute HIV Infection from a Cohort of High-risk Individuals in South Africa = Ukufakwa kwamamodeli Ezibalomininingo Ekuthelelekeni Kwesikhashana nge-HIV Eqoqweni labantu Abasengcupheni Enkulu eNingizimu Afrika.
    (2022) Yirga, Ashenafi Argaw.; Melesse, Sileshi Fanta.; Mwambi, Henry Godwell.; Ayele, Dawit Getnet.
    In this dissertation, longitudinal data modeling approaches to analyze data on CD4 cell counts measured repeatedly in HIV-infected patients enrolled in the Centre for the AIDS Programme of Research in South Africa are investigated. Longitudinal data, or repeated measurement data, is a specific form of multilevel data. In longitudinal studies, repeated observations are made on an individual on one or more outcomes, including covariates information at a baseline and over time. Mixedeffects models have become popular for modeling longitudinal data. This statistical procedure also permits the estimation of variability in hierarchically structured data and examines the impacts of factors at different levels. Since longitudinal studies are often faced with the incompleteness of the data due to partially observed subjects, the mixed-effects model is by its very nature able to deal with unbalanced data of this nature. Therefore, the study adopts the mixed-effects model and identifies whether specific clinical and sociodemographic factors present in the data influenced CD4 count in a cohort of HIV-infected patients. Since it is of great interest for a biomedical analyst or an investigator to correctly model the CD4 cell count or disease biomarkers of a patient in the presence of covariates or factors determining the disease progression over time, the Poisson regression approach, which explain variability in counts, is considered. The Poisson generalized mixed-effects models can be an appropriate choice for repeated count data. However, this model is not realistic because of the restriction that the mean and variance are equal. Therefore, the Poisson mixed-effects model is replaced by the negative binomial mixed-effects model. The later model effectively managed over-dispersion of the longitudinal data. We evaluate and compare the proposed models and their application to model CD4 cell counts of HIV-infected patients recruited in the study data set. The results reveal that the negative binomial mixedeffects model has appropriate properties and outperforms the Poisson mixed-effects model in terms of handling the over-dispersion of the data. Multiple imputation techniques are also used to handle missing values in the dataset to validate parameter estimates in modeling the negative binomial mixed-effects model by assuming a missing at random missingness. To illustrate the full conditional distribution of the repeated outcome, a quantile mixed-effects model is employed. This gives greater inclusive statistical modeling than conventional ordinary mixed models. Quantile regression offers an invaluable tool to discern effects that would be missed by other conventional regression models, which are solely based on modeling conditional mean. The quantile regression model that assumes asymmetric Laplace distribution for the error term was applied to longitudinal CD4 count data. The exact maximum likelihood estimation of the covariate effects and variance-covariance elements in the quantile mixed-effects model was implemented using the Stochastic Approximation Expectation-Maximization algorithm. In the model, multiple random effects are also incorporated to consider the correlation among the observations. Thus, we obtain robust parameter estimates for various conditional distribution positions that communicate an inclusive and more complete picture of the effects. Furthermore, to get more insights into the functional relationship between the response variable and the covariates, the generalized additive mixed-effects models, such as the additive negative binomial mixed-effects model, a versatile model used to better understand and analyze complex nonlinear trajectories in an overdispersed longitudinal data, is applied. Following the additive negative binomial mixed-effects model, an attempt to fit additive quantile mixed-effects model, an efficient and flexible framework for nonparametric as well as parametric longitudinal forms of data analysis focused on features of the outcome beyond its central tendency, was made. The response variable at hand is a CD4 count of HIV-infected patients as a function of Highly Active Antiretroviral Therapy initiation and other relevant baseline characteristics of the patients. Thus, even though this is a biostatistics methodological dissertation research, some interesting clinical and sociodemographic findings are also discussed. Discussion and conclusion of the results from the proposed models with a suggestion of possible further research avenues completed the study. Iqoqa Kule dizetheshini izindlelasu zokulinganisela imininingo eziyilongitudinal modeling approaches ukuhlaziya imininingo yezibalo zamasosha e-CD4 count ezikalwa ngokuphindaphindeka ezigulini ezitheleleke nge-HIV ezibhalise eSikhungweni soHlelo Lokucwaninga nge-AIDS eNingizimu Afrika kuyaphenywa. Imininingo enqumile, noma imininingo ekalwa ngokuphindelela, iwuhlobo oluqondile lwemininingo emazingeni ahlukene. Ocwaningweni olunqumile, ukubheka okuphindwayo kwenziwa kumuntu oyedwa emphumeleni owodwa noma engaphezulu, okufaka ulwazi lwamakhovariyenti njengesisekelo nangokuhamba kwesikhathi. Amamodeli anemithelela exubile aseyathandeka ekuhambiselaneni nemininingo yemodeli enqumile. Inqubo yezibalomininingo iphinde ivumele ukuqagula ukuguquguquka emininingweni enomumo onokugibelana iphinde ihlole imithelela yezimo emazingeni ehlukene. Njengoba ucwaningo olunqumile luvame ukubhekana nokungaphothulwa kwemininingo ngenxa yabantu ababhekwe ingxenye, imodeli enomthelelangxube ngokomumo wayo iyakwazi ukubhekana nemininingo engabhalansile eyilolu hlobo bese luhlonza izimo zokokusebenza kwengqondo nomumoqoqobantu emphakathini emininingweni ethinta i-CD4 count eqoqweni leziguli ezitheleleke nge-HIV. Njengoba kunentshisekelo enkulu ukuba umhlaziyi wezempilokwelapha noma umphenyi enze imodeli ngendlela ukubalwa kwamasosha i-CD4 count noma amabhayomakha esifo esigulini ebukhoneni bamakhovariyenti noma izimo ezihlonza ukuqhubeka kwesifo ngokuhamba kwesikhathi, indlelasu yokunqandeka kwesifo ngokukaPoisson, okuchaza ukuguquguquka ngokubalwa, kuyabhekwa. Amamodeli kaPoisson abekwe eceleni anemithelela exubile angaba wukukhetha okuyikho kwemininingo yokubala kokuphindelela. Kodwa, le modeli ayivezi okuyikho ngenxa yokuvimbeleka ukuthi imini nevariyenti kuyalingana. Ngakho-ke, imodeli kaPoisson enemithelelangxube imelwe yimodeli enemithelelangxube yebhayinomiyali engeyinhle. Imodeli yakamuva ilawula ngempumelelo yokusabalalisa kakhulu imininingo enqumile. Sihlaziya siphinde siqhathanise namamodeli aphakanyisiwe nokusetshenziswa kwayo ekubaleni amasosha omzimba i-CD4 cell count yeziguli ezitheleleke nge-HIV abafakwe ekubambeni iqhaza emininingweni yocwaningo. Imiphumela iveza ukuthi imodeli yemithelelangxube yebhayinomiyali engeyinhle enezakhiwomumo eyiwo nesebenza yedlule ekaPoisson nemodeli enomthelelangxube ngokwemigomo yokubheka ukusabalalisa ngokweqile imininingo. Amasu amaningi emvezabubi asetshenziselwa ukubhekana nezimo ezingabonakali kwisethi yemininingo ehlaziya iziqagulo zamapharamitha ekufakweni kwemodeli enemithelelangxube yebhayinomiyali ngokuthatha ngokuthi kunokungatholakali okungahleliwe. Ukukhombisa ukusabalalisa okugcwele okunemibandela komphumela ophindaphindekile, imodeli enemithelelangxube iyasetshenziswa. Lokhu kuveza ukufaka imodeli yezibalomininingo ezifaka konke okunamamodeli ajwayelekile ayingxube. Ukuncipha kwekhwantayli kunika ithuluzi elingenamsebenzi ukuhlonza imithelela ebingetholwe amamodeli ejwayelekile okuncipha, agxile kuphela kwimini encike ekufakweni kwemodeli. Imodeli yokuncipha kwekhwantayli evuma ukusabalalisa i-Laplace etshekile yetemu elingene ngephutha kwasetshenziswa emininingweni yokubala i-CD4. Ukuqagula okuyikho okuphezulu kwemithelela yamakhovariyenti nezakhi zekhovariyensi-variyensi kwimodeli enemithelelangxube yekhwantayli eyaqaliswa ukusebenza kusetshenziswa i-algorithimu i- Stochastic Approximation Expectation-Maximization. Kwimodeli, imithelela engahlelekile emininingo iphinde yafakwa ukuze kubhekwane nokuxhumana kokuqashelwayo. Ngakho-ke, sithola ukuqagula amapharamitha okunzulu ngemumo yokusabalalisa okunemigomo eyehlukahlukene okunika isithombe esifaka konke nesiphelele semithelela. Ngaphezu kwalokho, ukuthola imibono eyongeziwe ngobudlelwane obusebenzayo phakathi kwevariyebhuli yempendulo namakhovariyethi, amamodeli anemithelelangxube eyongezwayo, njengemodeli yemithelela exubile engemihle yebhayinomiyali eyongezwayo, imodeli enguqunguqu isetshenziselwe ukuqonda kangcono ngemodeli yokuhlaziya izinkombakusasa ezingenamigoqo ezinkimbi emininingweni engumumokuqonda ohlakazwe kakhulu, iyasetshenziswa. Uma kulandelwa imodeli enemithelelangxube yebhayinomiyali engeyinhle eyongezwayo, ukuze kuhambelane nomzamo wokufaka imodeli enemithelelangxube ayikhwayintali eyongezwayo, uhlaka oluguqulekayo nolusebenza ngendlela yezindlela ezingahambelani nepharamethrikhi kanjalo nepharamethrikhi engumumokuqonda wokuhlaziya imininingo okugxile ezicini zomphumela owedlula injwayelosenzo ewumongo, nakho kwenziwa. Ivariyebhuli yempendulo esebenzayo yisibalo samasosha omzimba i-CD4 ezigulini ezitheleleke nge-HIV njengomzamokuziqamba Wengxubekwelapha Ethithibalisa igciwane leSandulela Ngculazi Esebenza Kakhulu kanye nezinye izici eziyisisekelo eziyiso zeziguli. Ngakho-ke nakuba lena kuyidizetheshini yocwaningo lwendlelakwenza yocwaningo kwezibalomininingokuphila, kuphinde kwadingidwa okutholakele kwezempilongqondo nakwisifundomumoqoqobantu emphakathini. Ukudingida nokuphothulwe yimiphumela yamamodeli aphakanyisiwe ngesiphakamiso sezindlela zokukwenza ucwaningo oluqhubekayo ukuphothula ucwaningo.
  • Item
    Statistical and deep learning methods for cancer genomic data = Izindlela zokufunda ezijulile zezibalomidanti zemininingo yeqoqozinhlayiyafuzo lomdlavuza.
    (2021) Mohammed, Mohanad Mohammed Adam.; Mwambi, Henry Godwell.; Omolo, Bernard Oguna.
    Statistical and machine learning methods have been applied in broad domains including the medical field. These methods have a massive impact on healthcare by providing the support for decision making to the specialist in diagnosis and prognosis of patient disease status and disease progression. Non-communicable diseases (NCDs) remain a major challenge the world over in the 21st century, especially in developing countries where resources are limited. Recent global public health research shows an epidemiological paradigm shift from infection to non-communicable diseases, which include cancer. Cancer is considered the most devastating among all NCDs and is ranked second to malaria as the leading causes of death in the developing countries. Cancer occurs in many different types affecting all community members, where the general mechanism of cancer disease etiology is uncontrolled cells proliferation that leads to a malignant or cancerous tumor, and abnormalities at the molecular level. However, earlier detection and accurate diagnosis of cancer symptoms increase the probability of curing the condition, which has become the best strategy for fighting the disease. In the past few years, a vast amount of cancer data have been generated through new high throughput technologies. Traditional clinical and experimental approaches lack the capacity to handle such a massive scale of data. Therefore, computational methods have been introduced to biomedical investigations, including genes/biomarkers selection of cancer types and stages of the disease. Many computational tools have been developed based on different statistical and machine learning strategies and data science approaches. We used statistical, machine and deep learning methods for cancer types, subtypes, and survival prediction in this work. First, we developed a hybrid (DNA mutation and RNA expression) signature and assessed its predictive properties for colorectal cancer (CRC) patients’ mutation status and survival. In addition, we proposed a stacking ensemble deep learning approach to evaluate and compare its predictive performance for cancer types (as a multi-class classification problem) with the different standard machine and deep learning methods. Finally, we assessed the predictive performance of the Cox proportional hazard and random survival forests methods based on a signature obtained using three gene mutations (KRAS, BRAF, and TP53). However, the most significant limitation lies in the sample size being small, and there is a lack of using independent data for validation. Also, we did not consider different features such as methylation and mutation data. Moreover, it is unfortunate that the study did not include detailed simulation studies to compare the traditional statistical and machine learning methods. Overall, the most prominent finding to emerge from this investigation is that combining different data sources leads to more robust statistical significance. Also, the stacking approach is more reliable and promising compared to a single machine or deep learning. Furthermore, the RSF is a proper and striking method for survival analysis since it does not depend on any model assumptions. Iqoqa Izindlela zokufunda zezibalomidanti nemishini zisetshenziswa kakhulu ezizindeni ezibanzi ezibandakanya nomkhakha wezokwelapha. Lezi zindlela zinomthelela omkhulu kwezokunakekela ngokwempilo ngoba zeseka ukuthathwa kwezinqumo ngodokotela abawongoti uma kwenziwa inhlonzasimo kanye nohlahlokwelapha ngesimo sesifo sesiguli noma nokudlebeleka kwesifo. Izifo ezingathelelani ezaziwa ngeNon-communicable diseases (NCDs) zilokhu ziyingqinamba enkulu emhlabeni wonke jikelele ngekhulunyaka lama-21, ikakhulukazi emazweni asathuthuka lapho izinsizasidingo zigqoza khona. Ucwaningo lomhlaba olusanda kwenziwa ngempilo yomphakathi lukhombisa ukushintsha kwendlelakubuka ngembangela yokusabalala kwezifo kusuka ekuthelelekeni kuya ezifweni ezingathelelani, ezibandakanya isifo somdlavuza. Umdlavuza uthathwa njengesifo esicekelana phansi kakhulu uma kubukwa wonke amaNCD kanti singesesibili emuva kukamalaleveva njengesifo esiyimbangela yokufa kwabaningi emazweni asathuthuka. Umdlavuza uvela ngezindlela eziningi ezahlukene kanti uhlasela wonke amalunga omphakathi, lapho indlela ejwayelekile yembangelasifo emdlavuzeni kuba ukungalawuleki kokwanda kwamacells agcina edala izimila ezinomdlavuza, kanye nokungalungi ezingeni lamamolecule. Kepha ukutholwa kwesifo ngokushesha kanye nenhlonzasimo enembayo yezimpawu zomdlavuza kukhulisa amathuba okuselapha isifo okuyisu eliphuma phambili lokulwa nomdlavuza. Eminyakeni embalwa edlule, kuqoqwe indathane yemininingo yesifo somdlavuza kusetshenziswa ezobuchwepheshe. Izindlela zakudala zokwelapha nokuhlola okuyilinge kuyehluleka ukuthwala umthamo omkhulu wemininingo. Ngakho-ke, izindlela zobuchwepheshe sezazisiwe ekucwaningeni kokwelapha, okubandakanya ukukhetha ngokofuzo izinhlobo zomdlavuza nezigaba zesifo. Izinsiza zobuchwepheshe eziningi sezakhiwe kusetshenziswa njengesisekelo amasu okufunda emidantizibalo nemishini kanye nezinye izindlela zesayensi yemininingo. Kulolu cwaningo kwasetshenziswa imidantizibalo, imishini nezindlela zokufunda ezijulile ngezinhlobo zomdlavuza, izinhlotshana, kanye nokuqagula isikhathi sokuphila. Kwaqalwa ngokubunjwa kwenhlanganisela iDNA mutation neRNA expression kwase kuhlolwa isimo sezimpawu zokubikezela ngomdlavuza womtshazo icolorectal cancer (CRC) nokusinda kweziguli. Ngaphezu kwalokho, kwahlongozwa indlela ejulile yokufunda ehlanganisayo ukuhlola nokuqhathanisa ukubikezela kwezinhlobo zomdlavuza (njengendlela yokwahlukanisa izinto ngokwezigaba) ngezinhlobo ezahlukene zemishini nezindlela ezijulile zokufunda. Kwaphethwa ngokuhlola izindlela zokubikezela kweCox proportional hazard nerandom survival forests kusetshenziswa okutholakele ngokwezinhlobo kusetshenziswa izinguqukolibofuzo ezintathu (KRAS, BRAF, neTP53). Yize kunjalo, isithiyo esikhulu umkhawulo wokuthi isampula lincane, kanti kunokuswelakala kokusebenzisa imininingo ezimele ukuze kube nokuqinisekiswa. Kanti futhi ucwaningo aluzibhekanga ezinye izici ezahlukene njengemininingo yemethylation neyoguqukolibofuzo. Ngaphezu kwalokho, kuyishwa ukuthi lolu cwaningo alubandakanyanga ukusetshenziswa kwesingakwenza ukuze kuqhathaniswe izindlela zezibalomidanti zakudala nezindlela zokufunda ngemishini. Esiphethweni, umphumela onqala ovele kulolu cwaningo owokuthi ukuhlanganisa izinhlobo ezahlukene zezizinda zemininingo ocwaningweni kuholela ekuqineni kobalomidanti oluningi olubalulekile. Nokuthi ukunqwabelanisa kwethembekile futhi kuyethembisa uma kuqhathaniswa nomshini owodwa noma ukufunda okukodwa okujulile. Ngaphezu kwalokho, iRSF iyona ndlela efanele nencomekayo yokuhlaziya ukusinda ngoba ayincikile kwezinye izindlela ezicatshangwayo.
  • Item
    Joint predictors of preterm birth and perinatal death among singleton births at a zonal referral hospital in northern Tanzania: a birth registry based study from 2000 to 2017.
    (2021) Mboya, Innocent Baltazar.; Mwambi, Henry Godwell.; Mahande, Michael J.; Obure, Joseph.
    Background: Globally, preterm birth (births before 37 completed weeks of gestational) contributes to under-five and newborn deaths. Tanzania ranks the tenth country with the highest preterm birth rates globally and shares 2.2% of the global proportion of all preterm births and contributes to perinatal deaths. Perinatal deaths (stillbirths and early neonatal deaths) continue to increase relative to under-five deaths, especially in low- and middle-income countries. Previous exposure to perinatal death increases preterm birth risk. Understanding the independent and joint predictors of these outcomes may inform interventions to accelerate progress towards achieving sustainable development goals. The study aimed to determine the joint predictors of preterm birth and perinatal death among singleton births in northern Tanzania. Methods: The study utilized birth registry data from Kilimanjaro Christian Medical Center (KCMC) zonal referral hospital from 2000 to 2017, located in Moshi Municipality, Kilimanjaro region, Northern Tanzania. Generalized estimating equations (GEE) estimated the marginal effects of covariates on perinatal death. The predictive capacity of machine learning algorithms was compared with the classical logistic regression model to predict perinatal death. Multinomial logistic regression with cluster adjusted robust standard errors determined predictors of preterm birth. Joint predictors of preterm birth and perinatal death and the co-occurrence were estimated using the random-effects models to account for the correlation between these outcomes. Results: Perinatal mortality in this cohort slightly declined while preterm birth rates were increasing. Maternal demographic characteristics and pregnancy-related conditions and complications increase the risk of these outcomes. The joint predictors of higher risk of preterm birth and perinatal death were inadequate (<4) ANC visits, referred for delivery, and complications during pregnancy and childbirth, specifically pre-eclampsia/eclampsia, PPH, LBW, abruption placenta, and breech presentation. Younger maternal age (15-24 years), PROM, placenta previa, and male children have higher odds of preterm birth but a lessened likelihood of perinatal death. Conclusion: ANC is a critical entry point for delivering the recommended interventions to pregnant women, especially those at high risk of experiencing adverse pregnancy outcomes. Improved management of complications during pregnancy and childbirth and the postnatal period may eventually lead to substantially reducing adverse perinatal outcomes towards improving maternal and child health.
  • Item
    Integrating artificial neural networks, simulation and optimisation techniques in improving public emergency ambulance preparedness for heterogeneous regions under stochastic environments.
    (2021) Mapuwei, Tichaona Wilbert.; Bodhlyera, Oliver.; Mwambi, Henry Godwell.
    The Bulawayo Emergency Medical Services (BEMS) department continues to rely on judgemental methods with limited use of historical data for future predictions, strategic, tactical and operational level decision making. The rural to urban migration trend has seen the sprouting of new residential areas, and this has put pressure to the limited health, housing and education resources. It is expected that as population increases, there is subsequent increase in demand for public emergency services. However, public emergency ambulance demand trends has been decreasing in Bulawayo over the years. This trend is a sign of limited capacity of the service rather than demand itself. The situation demanded for consolidated efforts across all sectors including research, to restore confidence among residents, reduce health risk and loss of lives. The key objective was to develop a framework that would assist in integrating forecasting, simulation and optimisation techniques for ambulance deployment to predefined locations with heterogeneous demand patterns under stochastic environments, using multiple performance indicators. Secondary data from the Bulawayo Municipality archives from 2010 to 2018 was used for model building and validation. A combination of methods based on mathematics, statistics, operations research and computer science were used for data analysis, model building, sensitivity analysis and numerical experiments. Results indicate that feed forward neural network (FFNN) models are superior to traditional SARIMA models in predicting ambulance demand, over a short-term forecasting horizon. The FFNN model is more inclined to value estimation as compared to SARIMA model, which is directional as depicted by the linear pattern over time. An ANN model with a 7-(4)-1 architecture was selected to forecast 2019 public emergency ambulance demand (PEAD). Peak PEAD is expected in January, March, September and December whilst lower demand is expected for April, June and July 2019. Simulation models developed mimicked the prevailing levels of service for BEMS with six(6) operational ambulances. However. the average response times were well above 15 minutes, with significantly high average queuing times and number of ambulances queuing for service. These performance outcomes were highly undesirable as they pose a great threat to human based outcomes of safety and satisfaction with regards to service delivery. Optimisation for simulation was conducted by simultaneously minimising the average response time and average queuing time, while maximising throughput ratios. Increasing the number of ambulances influenced the average response time below a certain threshold, beyond this threshold, the average response time remained constant rather than decreasing gradually. Ambulance utilisation inversely varied to increase in the feet size. Numerical experiments revealed that reducing the response time results in the reduction in number of ambulances required for optimal ambulance deployment. It is imperative to simultaneously consider multiple performance indicators in ambulance deployment as it balances resource allocation and capacity utilisation, while avoiding idleness of essential equipment and human resources. Management should lobby for de-congestion and resurfacing of old and dilapidated roads to increase access and speed when responding to emergency calls. Future research should investigate the influence of varying service time on optimum deployment plans and consider operational costs, wages and other budgetary constraints that influence the allocation of critical but scarce resources such as personnel, equipment and emergency ambulance response vehicles.
  • Item
    Some statistical methods in analysis of single and multiple events with application to infant mortality data.
    (2020) Gatabazi, Paul.; Melesse, Sileshi Fanta.; Ramroop, Shaun.
    The time to event analysis or survival analysis aims at making inferences on the time elapsed between the recruitment of subjects or the onset of observations, until the occurrence of some event of interest. Methods used in general statistical analysis, in particular in regression analysis, are not directly applicable to time to event data due to covariate correlation, censoring and truncation. While analysing time to event data, medical statistics adopts mainly nonparametric methods due to difficulty in finding the adequate distribution of the phenomenon under study. This study reviews non-parametric classical methods of time to event analysis namely Aalen Additive Hazards Model (AAHM) trough counting and martingale processes, Cox Proportional Hazard Model (CPHM) and Cox-Aalen Hazards Model (CAHM) with application to the infant mortality at Kigali University Teaching Hospital (KUTH) in Rwanda. Proportional hazards assumption (PHA) was checked by assessing Kaplan-Meier estimates of survival functions per groups of covariates. Multiple events models were also reviewed and a model suitable to the dataset was selected. The dataset comprises 2117 newborns and socio-economic and clinical covariates for mothers and children. Two events per subject were modeled namely, the death and the occurrence of at least one of the conditions that may also cause long term death to infants. To overcome the instability of models (also known as checking consistence of models) and potential small sample size, re-sampling was applied to both CPHM and appropriate multiple events model. The popular non-parametric re-sampling methods namely bootstrap and jackknife for the available covariates were conducted and then re-sampled models were compared to the non-re-sampled ones. The results in different models reveal significant and non-significant covariates, the relative risk and related standard error and confidence intervals per covariate. Among the results, it was found that babies from under 20 years old mothers were at relatively higher risk and therefore, pregnancy of under 20 years old mothers should be avoided. It was also found that an infant’s abnormality in weight and head increases the risk of infant mortality, clinically recommended ways of keeping pregnancy against any cause of infant abnormality were then recommended.
  • Item
    Co-morbidity of childhood anaemia and malaria with a district-level spatial effect.
    (2021) Roberts, Danielle Jade.; Zewotir, Temesgen Tenaw.
    Anaemia and malaria are the leading causes of sub-Saharan African childhood morbidity and mortality. This thesis aimed to explore the risk factors as well as the complex relationship between anaemia and malaria in young children across the districts or counties of four contiguous sub-Saharan African countries, namely Kenya, Malawi, Tanzania and Uganda. Nationally representative data from the Demographic and Health Surveys conducted in all four countries was used. The observed prevalence of anaemia and malaria was 52.5% and 19.7%, respectively, with a 15.1% prevalence of co-infection. Machine learning based exploratory classification methods were used to gain insight into the relationships and patterns among the explanatory variables and the two responses. The administrative districts are the level at which public health decisions are made within each of the countries. Accordingly, the best linear unbiased predictor (BLUP) ranking and selection approach was adopted to investigate the district-level spatial effects, while controlling for child-level, household-level and environmental factors. Further to the geoadditive model, a generalised additive mixed model with a spatial effect based on the geographical coordinates of the sampled clusters within the districts was applied. The relationship between the two diseases was further explored using joint modelling approaches: a bivariate copula geoadditive model and shared component model. The child’s age, mother’s education level, household wealth index and cluster altitude were found to be significantly associated with both the anaemia and malaria status of the child. The results of this study can help policy makers target the correct set of interventions or prevent the use of incorrect interventions for anaemia and malaria control and prevention. This aids in the targeted allocation of limited district health system resources within each of these countries.
  • Item
    Bayesian spatial modeling of malnutrition and mortality among under-five children in sub-Saharan Africa.
    (2019) Adeyemi, Rasheed Alani.; Zewotir, Temesgen Tenaw.; Ramroop, Shaun.
    The aim of this thesis is to develop and extend Bayesian statistical models in the area of spatial modeling and apply them to child health outcomes, with particular focus on childhood malnutrition and mortality among under-five children. The easy availability of a geo-referenced database has stimulated a paradigm shift in methodological approaches to spatial analysis. This study reviewed the spatial methods and disease mapping models developed for areal (lattice) data analysis. Observational data collected from complex design surveys and geographical locations often violates the independent assumption of classical regression models. By relaxing the restrictive linearity and normality assumptions of classical regression models, this study first developed a flexible semi-parametric spatial model that accommodates the usual fixed effect, nonlinear and geographical component in a unified model. The approach was explored in the analysis of spatial patterns of child birth outcomes in Nigeria. The study also addressed the issue of disease clustering, which is of interest to epidemiologists and public health officials. The study then proposed a Bayesian hierarchical analysis approach for Poisson count data and formulated a Poisson version of generalized linear mixed models (GLMMs) for analyzing childhood mortality. The model simultaneously addressed the problem of overdispersion and spatial dependence by the inclusion of the risk factors and random effects in a single model. The proposed approach identified regions with elevated relative risk or clustering of high mortality and evaluated the small scale geographical disparities in sub-populations across the regions. The study identified another challenge in spatial data analysis, which are spatial autocorrelation and model misspecification. The study then fitted geoadditive mixed (GAM) models to analyze childhood anaemia data belonging to a family of exponential distributions (Gaussian, binary and multinomial). The GAM models are extension of generalized linear mixed models by allowing the inclusion of splines for continuous covariate (or time) trends with the parametric function. Lastly, the shared component model originally developed for multiple disease mapping was reviewed and modified to suit the binary data at hand. A multivariate conditional autoregressive (MCAR) model was developed and applied to jointly analyze three child malnutrition indicators. The approach facilitated the estimation of conditional correlation between the diseases; assess the spatial association with the regions and geographical variation of individual disease prevalence. The spatial analysis presented in this thesis is useful to inform health-care policy and resource allocation. This thesis contributes to methodological applications in life sciences, environmental sciences, public health and agriculture. The present study expands the existing methods and tools for health impact assessment in public health studies. KEYWORDS: Conditional Autoregressive (CAR) model, Disease Mapping Models, Multiple Disease mapping, Health Geography, Ecology Models, Spatial Epidemiology, Childhood Health outcomes.