An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments

Machine learning and artificial intelligence have achieved a human-level performance in many application domains, including image classification, speech recognition and machine translation. However, in the financial domain expert-based credit risk models have still been dominating. Establishing meaningful benchmark and comparisons on machine-learning approaches and human expert-based models is a prerequisite in further introducing novel methods. Therefore, our main goal in this study is to establish a new benchmark using real consumer data and to provide machine-learning approaches that can serve as a baseline on this benchmark. We performed an extensive comparison between the machine-learning approaches and a human expert-based model—FICO credit scoring system—by using a Survey of Consumer Finances (SCF) data. As the SCF data is non-synthetic and consists of a large number of real variables, we applied two variable-selection methods: the first method used hypothesis tests, correlation and random forest-based feature importance measures and the second method was only a random forest-based new approach (NAP), to select the best representative features for effective modelling and to compare them. We then built regression models based on various machine-learning algorithms ranging from logistic regression and support vector machines to an ensemble of gradient boosted trees and deep neural networks. Our results demonstrated that if lending institutions in the 2001s had used their own credit scoring model constructed by machine-learning methods explored in this study, their expected credit losses would have been lower, and they would be more sustainable. In addition, the deep neural networks and XGBoost algorithms trained on the subset selected by NAP achieve the highest area under the curve (AUC) and accuracy, respectively.

[1]  T. Jacobson,et al.  Bank lending policy, credit scoring and value-at-risk , 2003 .

[2]  Anderson Ara,et al.  Classification methods applied to credit scoring: A systematic review and overall comparison , 2016, 1602.02137.

[3]  Bart Baesens,et al.  Inferring descriptive and approximate fuzzy rules for credit scoring using evolutionary algorithms , 2007, Eur. J. Oper. Res..

[4]  Chih-Chou Chiu,et al.  Credit scoring using the hybrid neural discriminant technique , 2002, Expert Syst. Appl..

[5]  Pang Su-lin Study on Credit Scoring Model and Forecasting Based on Probabilistic Neural Network , 2005 .

[6]  Jian Ma,et al.  Two credit scoring models based on dual strategy ensemble trees , 2012, Knowl. Based Syst..

[7]  Feng-Chia Li,et al.  Combination of feature selection approaches with SVM in credit scoring , 2010, Expert Syst. Appl..

[8]  Hyejung Chang,et al.  A Smart e-Form for Effective Business Communication in the Financial Industry , 2018 .

[9]  Jian Shi,et al.  Credit Scoring by Fuzzy Support Vector Machines with a Novel Membership Function , 2016 .

[10]  I-Cheng Yeh,et al.  The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[11]  Christoforos Anagnostopoulos,et al.  A better Beta for the H measure of classification performance , 2012, Pattern Recognit. Lett..

[12]  Ruth H. Lytton,et al.  Household insolvency: A review of household debt repayment, delinquency, and bankruptcy , 1995 .

[13]  Jonathan Crook,et al.  Support vector machines for credit scoring and discovery of significant features , 2009, Expert Syst. Appl..

[14]  Moudud Alam,et al.  Review of the literature on credit risk modeling: development of the past 10 years , 2010 .

[15]  Yufei Xia,et al.  A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring , 2017, Expert Syst. Appl..

[16]  Soushan Wu,et al.  Credit rating analysis with support vector machines and neural networks: a market comparative study , 2004, Decis. Support Syst..

[17]  David A. Belsley A Guide to using the collinearity diagnostics , 1991, Computer Science in Economics and Management.

[18]  K. Do,et al.  Combining non-parametric models with logistic regression: an application to motor vehicle injury data , 2000 .

[19]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[20]  Paolo Giudici,et al.  Bayesian data mining, with application to benchmarking and credit scoring , 2001 .

[21]  Andreas Ziegler,et al.  Consumer credit risk: Individual probability estimates using machine learning , 2013, Expert Syst. Appl..

[22]  Tian-Shyug Lee,et al.  A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines , 2005, Expert Syst. Appl..

[23]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[24]  Donald Eugene. Farrar,et al.  Multicollinearity in Regression Analysis; the Problem Revisited , 2011 .

[25]  Chun-Ling Chuang,et al.  Constructing a reassigning credit scoring model , 2009, Expert Syst. Appl..

[26]  Cardona Alzate,et al.  Predicción y selección de variables con bosques aleatorios en presencia de variables correlacionadas , 2020 .

[27]  Hedieh Sajedi,et al.  A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring , 2015 .

[28]  Edward I. Altman,et al.  FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND THE PREDICTION OF CORPORATE BANKRUPTCY , 1968 .

[29]  J. Gooijer,et al.  Forecasting exchange rates using TSMARS , 1998 .

[30]  J. D. Spiceland,et al.  Intermediate Accounting: IFRS Edition , 2013 .

[31]  Yair E. Orgler A Credit Scoring Model for Commercial Loans , 1970 .

[32]  Alexander Hapfelmeier,et al.  A new variable selection approach using Random Forests , 2013, Comput. Stat. Data Anal..

[33]  Mohamed Limam,et al.  A THREE-STAGE FEATURE SELECTION USING QUADRATIC PROGRAMMING FOR CREDIT SCORING , 2013, Appl. Artif. Intell..

[34]  J. Friedman Multivariate adaptive regression splines , 1990 .

[35]  Hrvoje Volarević,et al.  INTERNAL MODEL FOR IFRS 9 - EXPECTED CREDIT LOSSES CALCULATION , 2018 .

[36]  T. Dinh,et al.  A credit scoring model for Vietnam's retail banking market , 2007 .

[37]  Paulo J. G. Lisboa,et al.  Making machine learning models interpretable , 2012, ESANN.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Y. Liu,et al.  Data mining feature selection for credit scoring models , 2005, J. Oper. Res. Soc..

[40]  Maria Felice Arezzo,et al.  Response-Based Sampling for Binary Choice Models With Sample Selection , 2018 .

[41]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[42]  Hussein A. Abdou,et al.  Neural nets versus conventional techniques in credit scoring in Egyptian banking , 2008, Expert Syst. Appl..

[43]  Stjepan Oreski,et al.  Genetic algorithm-based heuristic for feature selection in credit risk assessment , 2014, Expert Syst. Appl..

[44]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[45]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[46]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[47]  L. Thomas A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers , 2000 .

[48]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[49]  Yijia Zhang,et al.  P2P Network Lending, Loss Given Default and Credit Risks , 2018 .

[50]  Paulo J. G. Lisboa,et al.  Partial Logistic Artificial Neural Network for Competing Risks Regularized With Automatic Relevance Determination , 2009, IEEE Transactions on Neural Networks.

[51]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[52]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[53]  J. Suykens,et al.  Linear and Non-linear Credit Scoring by Combining Logistic Regression and Support Vector Machines , 2006 .

[54]  V. P. Bhapkar A Note on the Equivalence of Two Test Criteria for Hypotheses in Categorical Data , 1966 .

[55]  Yin-Fu Huang,et al.  Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data , 2009, Expert Syst. Appl..

[56]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[57]  Taylor B. Arnold,et al.  kerasR: R Interface to the Keras Deep Learning Library , 2017, J. Open Source Softw..

[58]  Lu Han,et al.  Orthogonal support vector machine for credit scoring , 2013, Eng. Appl. Artif. Intell..

[59]  W. J. Dixon,et al.  Processing Data for Outliers , 1953 .

[60]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[61]  So Young Sohn,et al.  Managing loan customers using misclassification patterns of credit scoring model , 2004, Expert Syst. Appl..

[62]  Maysam F. Abbod,et al.  Classifiers consensus system approach for credit scoring , 2016, Knowl. Based Syst..

[63]  Diego Andina,et al.  Artificial Metaplasticity Neural Network Applied to Credit Scoring , 2011, Int. J. Neural Syst..

[64]  Rajdeep Sengupta,et al.  Credit Scoring and Loan Default Credit Scoring and Loan Default , 2022 .

[65]  S. Tsai,et al.  An Empirical Research on Bank Client Credit Assessments , 2018 .

[66]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[67]  B. L. Welch THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNEQUAL , 1938 .

[68]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[69]  Chun-Ling Chuang,et al.  A hybrid neural network approach for credit scoring , 2011, Expert Syst. J. Knowl. Eng..

[70]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[71]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[72]  R. O’Brien,et al.  A Caution Regarding Rules of Thumb for Variance Inflation Factors , 2007 .