The management of missing categorical data : comparison of multiple imputation and subset correspondence analysis.

Hendry, Gillian Margaret.

The management of missing categorical data : comparison of multiple imputation and subset correspondence analysis.

dc.contributor.advisor	Zewotir, Temesgen Tenaw.
dc.contributor.advisor	Naidoo, Rajen.
dc.contributor.advisor	North, Delia Elizabeth.
dc.contributor.author	Hendry, Gillian Margaret.
dc.date.accessioned	2018-10-15T12:20:51Z
dc.date.available	2018-10-15T12:20:51Z
dc.date.created	2015
dc.date.issued	2015
dc.description	Doctor of Philosophy in Applied Statistics, University of KwaZulu-Natal, Westville, 2015.	en_US
dc.description.abstract	Missing data is a common problem in research and the manner in which this ‘missingness’ is managed, is crucial to the validity of analysis outcomes. This study illustrates the use of two diverse methods to handle, in particular, missing categorical data. These methods are applied to a set of data which intended to identify relationships between asthma severity in children and environmental, behavioural, genetic and socio-economic factors. This dataset suffered from substantial missingness. The first method involved the application of two approaches to multiple imputation, each adopting different distributional specifications. A practical challenge, previously undocumented, was encountered in the application of multiple imputation when interactions, to be identified and included in the analysis model, were needed for the imputation model. This study found that by imputing a single set of complete data using the expectation maximization (EM) algorithm for covariance matrices, it was possible to identify relevant interactions for inclusion in the imputation model. The second method illustrated the application of correspondence analysis to a subset of the data that includes only the measured data categories. The application of subset correspondence analysis (s-CA) with incomplete data, as well as its sensitivity to the type of missingness, has not been well documented, if at all. There is also no evidence of research in which interactions have been added to an analysis with s-CA. In this study its use, both with and without interactions, was illustrated and the results, when compared to those from the multiple imputation approach, were found to be similar and favourably complementary. A simulation study found that s-CA performed well with any type of missingness, provided the amount of missingness is less than 30% on any variable with incomplete data. Across all analyses, relationships found between asthma severity and factors were consistent with known relationships, thus providing confirmation of the reliability of the methods.	en_US
dc.identifier.uri	http://hdl.handle.net/10413/15643
dc.language.iso	en_ZA	en_US
dc.subject.other	Missing data.	en_US
dc.subject.other	Asthma severity.	en_US
dc.subject.other	Asthma categories.	en_US
dc.subject.other	Children with asthma.	en_US
dc.title	The management of missing categorical data : comparison of multiple imputation and subset correspondence analysis.	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Hendry_Gillian_Margaret_2015.pdf
Size:: 4.92 MB
Format:: Adobe Portable Document Format
Description:: Thesis.

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.64 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Doctoral Degrees (Statistics)