Masters Degrees (Statistics)
Permanent URI for this collectionhttps://hdl.handle.net/10413/7127
Browse
Browsing Masters Degrees (Statistics) by Issue Date
Now showing 1 - 20 of 107
- Results Per Page
- Sort Options
Item An application of some inventory control techniques.(1992) Samuels, Carol Anne.; Moolman, W. H.; Ryan, K. C.No abstract available.Item A study of student academic performance at the University of Natal.(1994) Naidoo, Robert.; Murray, Michael.In this dissertation a study will be made of university performance in the Science Faculty of the University of Natal, Durban. In particular, we will develop models that can be used to predict the success rate of a student based on his or her matriculation results. These models will prove useful for selecting students to universities. They may also be used to assist sponsors, bursars and donors in allocating funds to deserving students. In addition, these models may be used to identify students who might experience difficulties in their studies at university.Item Forecasting the monthly electricity consumption of municipalities in KwaZulu-Natal.(1997) Walton, Alison Norma.; Haines, Linda Margaret.Eskom is the major electricity supplier in South Africa and medium term forecasting within the company is a critical activity to ensure that enough electricity is generated to support the country's growth, that the networks can supply the electricity and that the revenue derived from electricity consumption is managed efficiently. This study investigates the most suitable forecasting technique for predicting monthly electricity consumption, one year ahead for four major municipalities within Kwa-Zulu Natal.Item Aspects of categorical data analysis.(1998) Govender, Yogarani.; Matthews, Glenda Beverley.The purpose of this study is to investigate and understand data which are grouped into categories. At the onset, the study presents a review of early research contributions and controversies surrounding categorical data analysis. The concept of sparseness in a contingency table refers to a table where many cells have small frequencies. Previous research findings showed that incorrect results were obtained in the analysis of sparse tables. Hence, attention is focussed on the effect of sparseness on modelling and analysis of categorical data in this dissertation. Cressie and Read (1984) suggested a versatile alternative, the power divergence statistic, to statistics proposed in the past. This study includes a detailed discussion of the power-divergence goodness-of-fit statistic with areas of interest covering a review on the minimum power divergence estimation method and evaluation of model fit. The effects of sparseness are also investigated for the power-divergence statistic. Comparative reviews on the accuracy, efficiency and performance of the power-divergence family of statistics under large and small sample cases are presented. Statistical applications on the power-divergence statistic have been conducted in SAS (Statistical Analysis Software). Further findings on the effect of small expected frequencies on accuracy of the X2 test are presented from the studies of Tate and Hyer (1973) and Lawal and Upton (1976). Other goodness-of-fit statistics which bear relevance to the sparse multino-mial case are discussed. They include Zelterman's (1987) D2 goodness-of-fit statistic, Simonoff's (1982, 1983) goodness-of-fit statistics as well as Koehler and Larntz's tests for log-linear models. On addressing contradictions for the sparse sample case under asymptotic conditions and an increase in sample size, discussions are provided on Simonoff's use of nonparametric techniques to find the variances as well as his adoption of the jackknife and bootstrap technique.Item The statistical analyses of a complex survey of banana pests and diseases in Uganda.(1999) Ngoya, Japheth N.; Clarke, G. Peter Y.No abstract available.Item Nonlinear models for neural networks.(2000) Brittain, Susan.; Haines, Linda Margaret.The most commonly used applications of hidden-layer feed forward neural networks are to fit curves to regression data or to provide a surface from which a classification rule can be found. From a statistical viewpoint, the principle underpinning these networks is that of nonparametric regression with sigmoidal curves being located and scaled so that their sum approximates the data well, and the underlying mechanism is that of nonlinear regression, with the weights of the network corresponding to parameters in the regression model, and the objective function implemented in the training of the network defining the error structure. The aim ofthe present study is to use these statistical insights to critically appraise the reliability and the precision of the predicted outputs from a trained hiddenlayer feed forward neural network.Item Spatial analysis and efficiency of systematic designs in intercropping experiments.(2002) Wandiembe, Symon Peter.; Njuho, Peter Mungai.In studies involving intercropping plant populations, the main interest is to locate the position of the maximum response or to study the response pattern. Such studies normally require many plant population levels. Thus, designs such as spacing systematic designs that minimise experimental land area are desired. Randomised block designs may not perform well as they allow few population levels which may not span the maximum or enable exploration of other features of the response surface. However, lack of complete randomisation in systematic designs may imply spatial variability (largescale and small-scale variations i.e. trend and spatial dependence) in observations. There is no correct statistical method laid out for data analysis from such designs. Given that spacing systematic designs are not well explored in literature, the main thrusts of this study are two fold; namely, to explore the use of spatial modelling techniques in analysing and modelling data from systematic designs, and to evaluate the efficiency of systematic designs used in intercropping experiments. Three classes of models for trend and error modelling are explored/introduced. These include spatial linear mixed models, semi-parametric mixed models and beta-hat models incorporating spatial variability. The reliability and precision of these methods are demonstrated. Relative efficiency of systematic designs to completely randomised design are evaluated. The analysis of data from systematic designs is shown be easily implemented. Measures of efficiency that includeItem Longitudinal survey data analysis.(2006) Nasirumbi, Pamela Opio.; Zewotir, Temesgen Tenaw.To investigate the effect of environmental pollution on the health of children in the Durban South Industrial Basin (DSIB) due to its proximity to industrial activities, 233 children from five primary schools were considered. Three of these schools were located in the south of Durban while the other two were in the northern residential areas that were closer to industrial activities. Data collected included the participants' demographic, health, occupational, social and economic characteristics. In addition, environmental information was monitored throughout the study specifically, measurements on the levels of some ambient air pollutants. The objective of this thesis is to investigate which of these factors had an effect on the lung function of the children. In order to achieve this objective, different sample survey data analysis techniques are investigated. This includes the design-based and model-based approaches. The nature of the survey data finally leads to the longitudinal mixed model approach. The multicolinearity between the pollutant variables leads to the fitting of two separate models: one with the peak counts as the independent pollutant measures and the other with the 8-hour maximum moving average as the independent pollutant variables. In the selection of the fixed-effects structure, a scatter-plot smoother known as the loess fit is applied to the response variable individual profile plots. The random effects and the residual effect are assumed to have different covariance structures. The unstructured (UN) covariance structure is used for the random effects, while using the Akaike information criterion (AIC), the compound symmetric (CS) covariance structure is selected to be appropriate for the residual effects. To check the model fit, the profiles of the fitted and observed values of the dependent variables are compared graphically. The data is also characterized by the problem of intermittent missingness. The type of missingness is investigated by applying a modified logistic regression model missing at random (MAR) test. The results indicate that school location, sex and weight are the significant factors for the children's respiratory conditions. More specifically, the children in schools located in the northern residential areas are found to have poor respiratory conditions as compared to those in the Durban-South schools. In addition, poor respiratory conditions are also identified for overweight children.Item Analysis of time-to-event data including frailty modeling.(2006) Phipson, Belinda.; Mwambi, Henry Godwell.There are several methods of analysing time-to-event data. These include nonparametric approaches such as Kaplan-Meier estimation and parametric approaches such as regression modeling. Parametric regression modeling involves specifying the distribution of the survival time of the individuals, which are commonly chosen to be either exponential, Weibull, log- normal, log-logistic or gamma distributed. Another well known model that does not require assumptions about the hazard function to be made is the Cox proportional hazards model. However, there may be deviations from proportional hazards which may be explained by unaccounted random heterogeneity. In the early 1980s, a series of studies showed concern with the possible bias in the estimated treatment e®ect when important covariates are omitted. Other problems may be encountered with the traditional proportional hazards model when there is a possibility of correlated data, for instance when there is clustering. A method of handling these types of problems is by making use of frailty modeling. Frailty modeling is a method whereby a random e®ect is incorporated in the Cox pro- portional hazards model. While this concept is fairly simple to understand, the method of estimation of the ¯xed and random e®ects becomes complicated. Various methods have been explored by several authors, including the Expectation-Maximisation (EM) algorithm, pe- nalized partial likelihood approach, Markov Chain Monte Carlo (MCMC) methods, Monte Carlo EM approach and di®erent methods using Laplace approximation. The lack of available software is problematic for ¯tting frailty models. These models are usually computationally extensive and may have long processing times. However, frailty modeling is an important aspect to consider, particularly if the Cox proportional hazards model does not adequately describe the distribution of survival time.Item Factors affecting the health status of the people of Lesotho.(2007) Moeti, Abiel.Lesotho, like any other country of the world, is faced with the task of improving theItem Inference from finite population sampling : a unified approach.(2007) Hargovan, Kashmira Ansuyah.; Arnab, Raghunath.; North, Delia Elizabeth.In this thesis, we have considered the inference aspects of sampling from a finite population. There are significant differences between traditional statistical inference and finite population sampling inference. In the case of finite population sampling, the statistician is free to choose his own sampling design and is not confined to independent and identically distributed observations as is often the case with traditional statistical inference. We look at the correspondence between the sampling design and the sampling scheme. We also look at methods used for drawing samples. The non – existence theorems (Godambe (1955), Hanurav and Basu (1971)) are also discussed. Since the minimum variance unbiased estimator does not exist for infinite populations, a number of estimators need to be considered for estimating the same parameter. We discuss the admissible properties of estimators and the use of sufficient statistics and the Rao-Blackwell Theorem for the improvement of inefficient inadmissible estimators. Sampling strategies using auxiliary information, relating to the population, need to be used as no sampling strategy can provide an efficient estimator of the population parameter in all situations. Finally few well known sampling strategies are studied and compared under a super population model.Item Time series modelling with application to South African inflation data(2009) Chinomona, AmosThe research is based on financial time series modelling with special applicationItem Estimating risk determinants of HIV and TB in South Africa.(2009) Mzolo, Thembile.; Mwambi, Henry G.; Zuma, Khangelani.Where HIV/AIDS has had its greatest adverse impact is on TB. People with TB that are infected with HIV are at increased risk of dying from TB than HIV. TB is the leading cause of death in HIV individuals in South Africa. HIV is the driving factor that increases the risk of progression from latent TB to active TB. In South Africa no coherent analysis of the risk determinants of HIV and TB has been done at the national level this study seeks to mend that gab. This study is about estimating risk determinants of HIV and TB. This will be done using the national household survey conducted by Human Sciences Research Council in 2005. Since individuals from the same household and enumerator area more likely to be more alike in terms of risk of disease or correlated among each other, the GEEs will be used to correct for this potential intraclass correlation. Disease occurrence and distribution is highly heterogeneous at the population, household and the individual level. In recognition of this fact we propose to model this heterogeneity at community level through GLMMs and Bayesian hierarchical modelling approaches with enumerator area indicating the community e ect. The results showed that HIV is driven by sex, age, race, education, health and condom use at sexual debut. Factors associated with TB are HIV status, sex, education, income and health. Factors that are common to both diseases are sex, education and health. The results showed that ignoring the intraclass correlation can results to biased estimates. Inference drawn from GLMMs and Bayesian approach provides some degree of con dence in the results. The positive correlation found at an enumerator area level for both HIV and TB indicates that interventions should be aimed at an area level rather than at the individual level.Item Applications of Levy processes in finance.(2009) Essay, Ahmed Rashid.; O’Hara, J. G.The option pricing theory set forth by Black and Scholes assumes that the underlying asset can be modeled by Geometric Brownian motion, with the Brownian motion being the driving force of uncertainty. Recent empirical studies, Dotsis, Psychoyios & Skiadopolous (2007) [17], suggest that the use of Brownian motion alone is insufficient in accurately describing the evolution of the underlying asset. A more realistic description of the underlying asset’s dynamics would be to include random jumps in addition to that of the Brownian motion. The concept of including jumps in the asset price model leads us naturally to the concept of a L'evy process. L'evy processes serve as a building block for stochastic processes that include jumps in addition to Brownian motion. In this dissertation we first examine the structure and nature of an arbitrary L'evy process. We then introduce the stochastic integral for L'evy processes as well as the extended version of Itˆo’s lemma, we then identify exponential L'evy processes that can serve as Radon-Nikod'ym derivatives in defining new probability measures. Equipped with our knowledge of L'evy processes we then implement this process in a financial context with the L'evy process serving as driving source of uncertainty in some stock price model. In particular we look at jump-diffusion models such as Merton’s(1976) [37] jump-diffusion model and the jump-diffusion model proposed by Kou and Wang (2004) [30]. As the L'evy processes we consider have more than one source of randomness we are faced with the difficulty of pricing options in an incomplete market. The options that we shall consider shall be mainly European in nature, where exercise can only occur at maturity. In addition to the vanilla calls and puts we independently derive a closed form solution for an exchange option under Merton’s jump-diffusion model making use of conditioning arguments and stochastic integral representations. We also examine some exotic options under the Kou and Wang model such as barrier options and lookback options where the solution to the option price is derived in terms of Laplace transforms. We then develop the Kou and Wang model to include only positive jumps, under this revised model we compute the value of a perpetual put option along with the optimal exercise point. Keywords Derivative pricing, L'evy processes, exchange options, stochastic integration.Item Modelling CD4+ count over time in HIV positive patients initiated on HAART in South Africa using linear mixed models.(2009) Yende, Nonhlanhla.; Mwambi, Henry G.HIV is among the highly infectious and pathogenic diseases with a high mortality rate. The spread of HIV is in uenced by several individual based epidemiological factors such as age, gender, mobility, sexual partner pro le and the presence of sexually transmitted infections (STI). CD4+ count over time provided the rst surrogate marker of HIV disease progression and is currently used for clinical management of HIV-positive patients. The CD4+ count as a key disease marker is repeatedly measured among those individuals who test HIV positive to monitor the progression of the disease since it is known that HIV/AIDS is a long wave event. This gives rise to what is commonly known as longitudinal data. The aim of this project is to determine if the patients' weight, baseline age, sex, viral load and clinic site, in uences the rate of change in CD4+ count over time. We will use data of patients who commenced highly active antiretroviral therapy (HAART) from the Center for the AIDS Programme of Research in South Africa (CAPRISA) in the AIDS Treatment Project (CAT) between June 2004 and September 2006, including two years of follow-up for each patient. Analysis was done using linear mixed models methods for longitudinal data. The results showed that larger increase in CD4+ count over time was observed in females and individuals who were younger. However, upon tting baseline log viral load in the model instead of the log viral at all visits was that, larger increase in CD4+ count was observed in females, individuals who were younger, had higher baseline log viral load and lower weight.Item Stochastic volatility effects on defaultable bonds.(2009) Mkize, Thembisile.; O'Hara, John Gerard.We study the eff ects of stochastic volatility of defaultable bonds using the first -passage structural approach. In this approach Black and Cox (1976) argued that default can happen at any time. This then led to the development of afirst-passage model, in which a rm (company) default occurs when its value falls to a barrier. In the first-passage model the rm debt is considered to be a single pure discount bond and default occurs only if the rm value falls below the face value of the bond at maturity. Here the firm's debt can be viewed as a portfolio composed of a risk-free bond and a short-put option on the value of a rm. The classic Black-Scholes-Merton model only considers a single liability and the solvency is tested at the maturity date, while the extended Black-Scholes-Merton model allows for default at any time before maturity to cater for more complex capital structures and was delivered by Geske, Black-Cox, Leland, Leland and Toft and others. In this work a review of the eff ect of stochastic volatility on defaultable bonds is given. In addition a study from the first-passage structural approach and reduced-form approach is made. We also introduce symmetry analysis to study some of the equations that appear in option-pricing models. This approach is quite recent and has produced successful results. In this work we lay the foundation of this method. Keywords: Stochastic Volatility, Defaultable bonds, Lie Symmetries.Item Modelling acute HIV infection using longitudinally measured biomarker data including informative drop-out.(2009) Werner, Lise.; Mwambi, Henry G.Background. Numerous methods have been developed to model longitudinal data. In HIV/AIDS studies, HIV markers, CD4+ count and viral load are measured over time. Informative drop-out and the lower detection limit of viral load assays can bias the results and influence assumptions of the models. Objective The objective of this thesis is to describe the evolution of HIV markers in an HIV-1 subtype C acutely infected cohort of women from the CAPRISA 002: Acute Infection Study in Durban, South Africa. They were HIV treatment naive. Methods. Various linear mixed models were fitted to both CD4+ count and viral load, adjusting for repeated measurements, as well as including intercept and slope as random effects. The rate of change in each of the HIV markers was assessed using weeks post infection as both a linear effect and piecewise linear effects. Left-censoring of viral load was explored to account for missing data resulting from undetectable measurements falling below the lower detection limit of the assay. Informative drop- out was addressed by using a method of joint modelling in which a longitudinal and survival model were jointly linked using a latent Gaussian process. The progression of HIV markers were described and the effectiveness and usefulness of each modelling procedure was evaluated. Results. 62 women were followed for a median of 29 months post infection (IQR 20-39). Viral load increased sharply by 2.6 log copies/ml per week in the first 2 weeks of infection and decreased by 0.4 log copies/ml per week the next fortnight. It decreased at a slower rate thereafter. Similarly CD4+ count fell in the first 2 weeks by 4.4 square root cells/ul per week then recovered slightly only to decrease again. Left-censoring was unnecessary in this acute infection cohort as few viral load measures were below the detection limit and provided no improvement on model fit. Conclusion. Piecewise linear effects proved to be useful in quantifying the degree at which the HIV markers progress during the first few weeks of HIV infection, whereas modelling time as a linear effect was not very meaningful. Modelling HIV markers jointly with informative drop-out is shown to be necessary to account for the missing data incurred from participants leaving the study to initiate ARV treatment. In ignoring this drop-out, CD4+ count is estimated to be higher than what it actually is.Item Modeling environmental factors affecting the growth of eucalypt clones.(2009) Chauke, Morries.; Zewotir, Temesgen Tenaw.; Ndlovu, Principal.; Grzeskowiak, Valerie.Tree growth is influenced by environment and genetic factors. The same tree growing in different areas will have different growth patterns. Trees with different genetic material, e.g. pine and Eucalyptus trees, growing under the same environmental conditions have different growth patterns. Plantation trees in South Africa are mainly used for pulp and paper production. Growth is an important economic factor in the pulp and paper industry. Plantations with fast growth will be available for processing earlier compared to a slow growth plantation. Consequently, it is important to understand the role played by environmental factors, especially climatic factors, on tree growth. This thesis investigated the climatic effects on the radial growth of two Eucalyptus clones using growth data collected daily over five years by Sappi. The general linear model and the time series models were used to assess the effects of climate on radial growth of the two clones. It was found that the two clones have similar overall growth patterns over time, but differ in growth rates. The growth pattern of the two clones appears to be characterized by substantial jumps/changes in growth rates over time. The times at which the jumps/changes in growth rate occur are referred to as the “breakpoints”. The piecewise linear regression model was used to estimate when the breakpoints occur. After estimating the breakpoints, the climatic effects associated with these breakpoints were investigated. The linear and time series modeling results indicated that the contribution of climatic factors on radial growth of Eucalyptus clones was small. Most of the variation in radial growth was explained by the age of the trees. Consequently, this thesis also investigated the appropriate functional relationship between radial growth and age. In particular, this nonlinear growth models were used to model the radial growth process. The investigated growth curve models were those which included the maximum radius and the age at which the radial growth rate is largest as some of the parameters. The maximum growth rate was calculated from the estimated model of each clone. The results indicated that the two clones reach the maximum growth rate at different times. In particular, the two clones reach the maximum growth rates at around 368 and 376 days, respectively. Furthermore, the maximum radius was found to be different for the two clones.Item Modeling the factors affecting cereal crop yields in the Amhara National Regional State of Ethiopia.(2010) Mohammed, Yunus Hussien.; Ramroop, Shaun.; Zewotir, Temesgen.The agriculture sector in Amhara National Regional State is characterised by producing cereal crops which occupy the largest percentage (84.3%) of the total crop area cultivated in the region. As a result, it is imperative to investigate which factors influence the yields of cereal crops particularly in relation to the five major types of cereals in the study region namely barley, maize, sorghum, teff and wheat. Therefore, in this thesis, using data collected by the Central Statistical Agency of Ethiopia, various statistical methods such as multiple regression analysis were applied to investigate the factors which influence the mean yields of the major cereal crops. Moreover, a mixed model analysis was implemented to assess the effects associated with the sampling units (enumeration areas), and a cluster analysis to classify the region into similar groups of zones. The multiple regression results indicate that all the studied cereals mean yields are affected by zone, fertilizer type and crop damage effects. In addition to this, barley is affected by extension programme; maize crop by seed type, irrigation, and protection of soil erosion; sorghum and teff crops are additionally affected by crop prevention method, extension programme, protection of soil erosion, and gender of the household head; and wheat crop by crop prevention methods, extension programme and gender of the household head. The results from the mixed model analysis were entirely different from the regression results due to the observed dependencies of the cereals mean yields on the sampling unit. Based on the hierarchical cluster analysis, five groups of classes (clusters) were identified which seem to be in agreement with the geographical neighbouring positions of the locations and the similarity of the type of crops produced.Item Statistical and mathematical modelling of HIV and AIDS, effect of reverse transcriptase inhibitors and causal inference for HIV mortality.(2010) Ngwenya, Olina.; Mwambi, Henry G.The HIV and AIDS epidemic has remained one of the leading causes of death in the world and has been destructive in Africa with Sub-Saharan Africa remaining the epidemiological locus of the epidemic. HIV and AIDS hinders development by erasing decades of health, economic and social progress, reducing life expectancy by years and deepening poverty [57].The most urgent public-health problem globally is to devise effective strategies to minimize the destruction caused by the HIV and AIDS epidemic. Due to the problems caused by HIV and AIDS, well defined endpoints to evaluate treatment benefits are needed. The surrogate and true endpoints for a disease need to be specified. The purpose of a surrogate endpoint is to draw conclusions about the effect of intervention on true endpoint without having to observe the true endpoint. It is of great importance to understand the surrogate validation methods. At present the question remains as to whether CD4 count and viral load are good surrogate markers for death in HIV or there are some better surrogate markers. This dissertation was undertaken to obtain some clarity on this question by adopting a mathematical model for HIV at immune system level and the impact of treatment in the form of reverse transcriptase inhibitors (RTIs). For an understanding of HIV, the dissertation begins with the description of the human immune system, HIV virion structure, HIV disease progression and HIV drugs. Then a review of an existing mathematical model follows, analyses and simulations of this model are done. These gave an insight into the dynamics of the CD4 count, viral load and HIV therapy. Thereafter surrogate marker validation methods followed. Finally generalized estimating equations (GEEs) approach was used to analyse real data for HIV positive individuals, from the Centre for the AIDS Programme of Research in South Africa (CAPRISA). Numerical simulations for the HIV dynamic model with treatment suggest that the higher the treatment efficacy, the lower the infected cells are left in the body. The infected cells are suppressed to a lower threshold value but they do not completely disappear, as long as the treatment is not 100% efficacious. Further numerical simulations suggest that it is advantageous to have a low proportion of infectious virions (ω) at an individual level because the individual would produce few infectious virions to infect healthy cells. Statistical analysis model using GEEs suggest that CD4 count< 200 and viral load are highly associated with death, meaning that they are good surrogate markers for death. An interesting finding from the analysis of this particular data from CAPRISA was that low CD4 count and high viral loads as surrogates for HIV survival act independently/additively. The interaction effect was found to be insignificant. Individual characteristics or factors that were found to be significantly associated with HIV related death are weight, CD4 count< 200 and viral load.