The application of classification techniques in modelling credit risk.
The aim of this dissertation is to examine the use of classification techniques to model credit risk through a practice known as credit scoring. In particular, the focus is on one parametric class of classification techniques and one non-parametric class of classification techniques. Since the goal of credit-scoring is to improve the quality of the decisions in evaluating a loan application, advanced and interesting methods that improve upon the performance of linear discriminant analysis (LDA) and classification and regression trees (CART) will be explored. For LDA these methods include a description of quadratic discriminant analysis (QDA), flexible discriminant analysis (FDA) and mixture discriminant analysis (MDA). Multivariate adaptive regression splines (MARS) are used in the FDA procedure. An Expectation Maximization (EM)-algorithm that estimates the model parameters in MDA will be developed thereof. Techniques that help to improve the performance of CART such as bagging, random forests and boosting are also discussed at length. A real life dataset was used as an illustration to how these credit-scoring models can be used to classify a new applicant. The dataset shall be split into a ‘learning sample’ and a ‘testing sample’. The learning sample will be used to develop the credit-scoring model (also known as a scorecard) whilst the testing sample will be used to test the predictive capability of the scorecard that would have been constructed. The predictive performance of the scorecards will be assessed using four measures; a classification error rate, a sensitivity measure, a specificity measure and the area under the ROC curve (AUC). Based on these four model performance measures, the empirical results reveal that there is no single ideal scorecard for modelling credit risk because such a conclusion depends on the aims and objectives of the lender, the details of the problem and the data structure.