Mixture cure models in credit scoring: If and when borrowers default

Mixture cure models were originally proposed in medical statistics to model long-term survival of cancer patients in terms of two distinct subpopulations – those that are cured of the event of interest and will never relapse, along with those that are uncured and are susceptible to the event. In the present paper, we introduce mixture cure models to the area of credit scoring, where, similarly to the medical setting, a large proportion of the dataset may not experience the event of interest during the loan term, i.e. default. We estimate a mixture cure model predicting (time to) default on a UK personal loan portfolio, and compare its performance to the Cox proportional hazards method and standard logistic regression. Results for credit scoring at an account level and prediction of the number of defaults at a portfolio level are presented; model performance is evaluated through cross validation on discrimination and calibration measures. Discrimination performance for all three approaches was found to be high and competitive. Calibration performance for the survival approaches was found to be superior to logistic regression for intermediate time intervals and useful for fixed 12month time horizon estimates, reinforcing the flexibility of survival analysis as both a risk ranking tool and for providing robust estimates of probability of default over time. Furthermore, the mixture cure model’s ability to distinguish between two subpopulations can offer additional insights by estimating the parameters that determine susceptibility to default in addition to parameters that influence time to default of a borrower.

[1]  Yildiray Yildirim Estimating Default Probabilities of CMBS Loans with Clustering and Heavy Censoring , 2008 .

[2]  Jonathan N. Crook,et al.  Recent developments in consumer credit risk assessment , 2007, Eur. J. Oper. Res..

[3]  David J. Hand,et al.  Lookahead scorecards for new fixed term credit products , 2001, J. Oper. Res. Soc..

[4]  L. Thomas Consumer credit models: pricing, profit and portfolios , 2009 .

[5]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[6]  John Banasik,et al.  Not if but when will borrowers default , 1999, J. Oper. Res. Soc..

[7]  P. V. Rao,et al.  Applied Survival Analysis: Regression Modeling of Time to Event Data , 2000 .

[8]  K. Dear,et al.  A Nonparametric Mixture Model for Cure Rate Estimation , 2000, Biometrics.

[9]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[10]  J. P. Sy,et al.  Estimation in a Cox Proportional Hazards Cure Model , 2000, Biometrics.

[11]  Maria Stepanova,et al.  Survival Analysis Methods for Personal Loan Data , 2002, Oper. Res..

[12]  D.,et al.  Regression Models and Life-Tables , 2022 .

[13]  Vernon T. Farewell,et al.  Mixture models in survival analysis: Are they worth the risk? , 1986 .

[14]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[15]  Pierre Joly,et al.  A SAS macro for parametric and semiparametric mixture cure models , 2007, Comput. Methods Programs Biomed..

[16]  N. Breslow Covariance analysis of censored survival data. , 1974, Biometrics.

[17]  Lyn C. Thomas,et al.  Consumer finance: challenges for operational research , 2010, J. Oper. Res. Soc..

[18]  I-Cheng Yeh,et al.  The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[19]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[20]  J. W. Lewis,et al.  A note on concordance correlation coefficient. , 2000, PDA journal of pharmaceutical science and technology.

[21]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[22]  Bart Baesens,et al.  Neural network survival analysis for personal loan data , 2005, J. Oper. Res. Soc..

[23]  B. Rost Basel Committee On Banking Supervision , 2010 .

[24]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[25]  V. Farewell,et al.  The use of mixture models for the analysis of survival data with long-term survivors. , 1982, Biometrics.

[26]  Lyn C. Thomas Modelling the credit risk for portfolios of consumer loans: Analogies with corporate loan models , 2009, Math. Comput. Simul..

[27]  N. Cox,et al.  A Note on the Concordance Correlation Coefficient , 2002 .

[28]  Chih-Fong Tsai,et al.  Credit rating by hybrid machine learning techniques , 2010, Appl. Soft Comput..

[29]  T Bellotti,et al.  Credit scoring with macroeconomic variables using survival analysis , 2009, J. Oper. Res. Soc..

[30]  David W. Hosmer,et al.  Applied Survival Analysis: Regression Modeling of Time-to-Event Data , 2008 .