An empirical comparison of classification algorithms for mortgage default prediction: evidence from a distressed mortgage market

This paper evaluates the performance of a number of modelling approaches for future mortgage default status. Boosted regression trees, random forests, penalised linear and semi-parametric logistic regression models are applied to four portfolios of over 300,000 Irish owner-occupier mortgages. The main findings are that the selected approaches have varying degrees of predictive power and that boosted regression trees significantly outperform logistic regression. This suggests that boosted regression trees can be a useful addition to the current toolkit for mortgage credit risk assessment by banks and regulators.

[1]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[2]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[3]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[4]  David J. Hand,et al.  Mining the past to determine the future: Problems and possibilities , 2009 .

[5]  Nicole Martin,et al.  Assessing scorecard performance: A literature review and classification , 2013, Expert Syst. Appl..

[6]  Peter A. Flach,et al.  A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance , 2011, ICML.

[7]  John M. Quigley,et al.  Mortgage Terminations, Heterogeneity and the Exercise of Mortgage Options , 1999 .

[8]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[9]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[10]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[11]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[12]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[13]  G. Kennedy,et al.  The Irish Mortgage Market: Stylised Facts, Negative Equity and Arrears , 2012 .

[14]  J. J. Narraway,et al.  Probability machines , 1989, Microprocess. Microprogramming.

[15]  J. Galindo,et al.  Credit Risk Assessment Using Statistical and Machine Learning: Basic Methodology and Risk Modeling Applications , 2000 .

[16]  Georg Krempl,et al.  Classification in Presence of Drift and Latency , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[17]  R. Berk,et al.  Small Area Estimation of the Homeless in Los Angeles: An Application of Cost-Sensitive stochastic Gradient Boosting , 2010, 1011.2890.

[18]  Clifford M. Hurvich,et al.  Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion , 1998 .

[19]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[20]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[21]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[22]  A. Lo,et al.  Consumer Credit Risk Models Via Machine-Learning Algorithms , 2010 .

[23]  Christopher L. Foote,et al.  Negative Equity and Foreclosure: Theory and Evidence , 2008 .

[24]  Dave Feldman,et al.  Mortgage Default: Classification Trees Analysis , 2004 .

[25]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[26]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[27]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[28]  Daniel Berg Bankruptcy Prediction by Generalized Additive Models , 2006 .

[29]  Sharon L. Lohr,et al.  Sampling Assumptions in Introductory Statistics Classes , 2007 .

[30]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[31]  Dirk Van den Poel,et al.  Handling class imbalance in customer churn prediction , 2009, Expert Syst. Appl..

[32]  Kristof Coussement,et al.  Ensemble classification based on generalized additive models , 2010, Comput. Stat. Data Anal..

[33]  Joseph S. Tracy,et al.  Juvenile Delinquent Mortgages: Bad Credit or Bad Economy? , 2008 .

[34]  S. Wood mgcv:Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation , 2012 .

[35]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[36]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[37]  J. Maindonald Statistical Learning from a Regression Perspective , 2008 .

[38]  Gerhard Tutz,et al.  A comparison of methods for the fitting of generalized additive models , 2008, Stat. Comput..

[39]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[40]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[41]  Sanjiv Ranjan Das,et al.  Strategic Loan Modification: An Options- Based Response to Strategic Default , 2011 .

[42]  Robert Van Order Modeling and evaluating the credit risk of mortgage loans: a primer , 2008 .

[43]  Andreas Ziegler,et al.  Consumer credit risk: Individual probability estimates using machine learning , 2013, Expert Syst. Appl..

[44]  T Bellotti,et al.  Credit scoring with macroeconomic variables using survival analysis , 2009, J. Oper. Res. Soc..

[45]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[46]  Silvia Angela Osmetti,et al.  Generalized Extreme Value Regression for Binary Rare Events Data: an Application to Credit Defaults , 2011 .

[47]  Sanjiv Ranjan Das,et al.  The Principal Principle , 2012, Journal of Financial and Quantitative Analysis.

[48]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[49]  Jonathan Crook,et al.  Support vector machines for credit scoring and discovery of significant features , 2009, Expert Syst. Appl..

[50]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[51]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[52]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[53]  Ruud H. Koning,et al.  A Practical Approach to Validating a PD Model , 2009 .

[54]  Silvia Angela Osmetti,et al.  Modelling small and medium enterprise loan defaults as rare events: the generalized extreme value regression model , 2013 .

[55]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[56]  R. Tibshirani,et al.  Additive Logistic Regression : a Statistical View ofBoostingJerome , 1998 .

[57]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[58]  Jonathan N. Crook,et al.  Recent developments in consumer credit risk assessment , 2007, Eur. J. Oper. Res..

[59]  Brian Mac Namee,et al.  A window of opportunity: Assessing behavioural scoring , 2013, Expert Syst. Appl..

[60]  Bart Baesens,et al.  From linear to non-linear kernel based classifiers for bankruptcy prediction , 2010, Neurocomputing.

[61]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[62]  Kenneth Kennedy,et al.  Using semi-supervised classifiers for credit scoring , 2013, J. Oper. Res. Soc..

[63]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[64]  Joao Bastos,et al.  Credit scoring with boosted decision trees , 2007 .

[65]  C. V. Jawahar,et al.  Generalized RBF feature maps for Efficient Detection , 2010, BMVC.

[66]  J. Friedman Stochastic gradient boosting , 2002 .

[67]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[68]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..