Aspects of categorical data analysis.
The purpose of this study is to investigate and understand data which are grouped into categories. At the onset, the study presents a review of early research contributions and controversies surrounding categorical data analysis. The concept of sparseness in a contingency table refers to a table where many cells have small frequencies. Previous research findings showed that incorrect results were obtained in the analysis of sparse tables. Hence, attention is focussed on the effect of sparseness on modelling and analysis of categorical data in this dissertation. Cressie and Read (1984) suggested a versatile alternative, the power divergence statistic, to statistics proposed in the past. This study includes a detailed discussion of the power-divergence goodness-of-fit statistic with areas of interest covering a review on the minimum power divergence estimation method and evaluation of model fit. The effects of sparseness are also investigated for the power-divergence statistic. Comparative reviews on the accuracy, efficiency and performance of the power-divergence family of statistics under large and small sample cases are presented. Statistical applications on the power-divergence statistic have been conducted in SAS (Statistical Analysis Software). Further findings on the effect of small expected frequencies on accuracy of the X2 test are presented from the studies of Tate and Hyer (1973) and Lawal and Upton (1976). Other goodness-of-fit statistics which bear relevance to the sparse multino-mial case are discussed. They include Zelterman's (1987) D2 goodness-of-fit statistic, Simonoff's (1982, 1983) goodness-of-fit statistics as well as Koehler and Larntz's tests for log-linear models. On addressing contradictions for the sparse sample case under asymptotic conditions and an increase in sample size, discussions are provided on Simonoff's use of nonparametric techniques to find the variances as well as his adoption of the jackknife and bootstrap technique.