Corporate default forecasting with machine learning

Abstract We analyze the performance of a set of machine learning models in predicting default risk, using standard statistical models, such as the logistic regression, as a benchmark. When only a limited information set is available, for example in the case of an external assessment of credit risk, we find that machine learning models provide substantial gains in discriminatory power and precision, relative to statistical models. This advantage diminishes when confidential information, such as credit behavioral indicators, is also available, and it becomes negligible when the dataset is small. Moreover, we evaluate the consequences of using a credit allocation rule based on machine learning ratings on the overall supply of credit and the number of borrowers gaining access to credit. Machine learning models concentrate a greater extent of credit towards safer and larger borrowers, which would result in lower credit losses for their lenders.

[1]  A. Lo,et al.  Consumer Credit Risk Models Via Machine-Learning Algorithms , 2010 .

[2]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[3]  Edward I. Altman,et al.  FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND THE PREDICTION OF CORPORATE BANKRUPTCY , 1968 .

[4]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[5]  Herbert Kimura,et al.  Machine learning models and bankruptcy prediction , 2017, Expert Syst. Appl..

[6]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities: A Simple Procedure , 2002, Neural Computation.

[7]  Andreas Ziegler,et al.  Consumer credit risk: Individual probability estimates using machine learning , 2013, Expert Syst. Appl..

[8]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[9]  Gianluca Bontempi,et al.  When is Undersampling Effective in Unbalanced Classification Tasks? , 2015, ECML/PKDD.

[10]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[11]  Andrea Resti,et al.  Risk Management and Shareholders' Value in Banking: From Risk Measurement Models to Capital Allocation Policies , 2007 .

[12]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[13]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[14]  E. Laitinen,et al.  Financial Distress Prediction in an International Context: A Review and Empirical Analysis of Altman's Z‐Score Model , 2017 .

[15]  Byron C. Wallace,et al.  Improving class probability estimates for imbalanced data , 2013, Knowledge and Information Systems.

[16]  E. Altman Corporate financial distress : a complete guide to predicting, avoiding, and dealing with bankruptcy , 1983 .

[17]  Reid A. Johnson,et al.  Calibrating Probability with Undersampling for Unbalanced Classification , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[18]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[19]  Danny Yuan Applications of machine learning : consumer credit risk analysis , 2015 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.