Statistical modelling of availability of major food cereals in Lesotho : application of regression models and diagnostics.
Oftentimes, application of regression models to analyse cereals data is limited to estimating and predicting crop production or yield. The general approach has been to fit the model without much consideration of the problems that accompany application of regression models to real life data, such as collinearity, models not fitting the data correctly and violation of assumptions. These problems may interfere with applicability and usefulness of the models, and compromise validity of results if they are not corrected when fitting the model. We applied regression models and diagnostics on national and household data to model availability of main cereals in Lesotho, namely, maize, sorghum and wheat. The application includes the linear regression model, regression and collinear diagnostics, Box-Cox transformation, ridge regression, quantile regression, logistic regression and its extensions with multiple nominal and ordinal responses. The Linear model with first-order autoregressive process AR(1) was used to determine factors that affected availability of cereals at the national level. Case deletion diagnostics were used to identify extreme observations with influence on different quantities of the fitted regression model, such as estimated parameters, predicted values, and covariance matrix of the estimates. Collinearity diagnostics detected the presence of more than one collinear relationship coexisting in the data set. They also determined variables involved in each relationship, and assessed potential negative impact of collinearity on estimated parameters. Ridge regression remedied collinearity problems by controlling inflation and instability of estimates. The Box-Cox transformation corrected non-constant variance, longer and heavier tails of the distribution of data. These increased applicability and usefulness of the linear models in modeling availability of cereals. Quantile regression, as a robust regression, was applied to the household data as an alternative to classical regression. Classical regression estimates from ordinary least squares method are sensitive to distributions with longer and heavier tails than the normal distribution, as well as to outliers. Quantile regression estimates appear to be more efficient than least squares estimates for a wide range of error term distribution. We studied availability of cereals further by categorizing households according to availability of different cereals, and applied the logistic regression model and its extensions. Logistic regression was applied to model availability and non-availability of cereals. Multinomial logistic regression was applied to model availability with nominal multiple categories. Ordinal logistic regression was applied to model availability with ordinal categories and this made full use of available information. The three variants of logistic regression model gave results that are in agreement, which are also in agreement with the results from the linear regression model and quantile regression model.