Credit Risk Analysis Using Machine and Deep Learning Models

Due to the hyper technology associated to Big Data, data availability and computing power, most banks or lending financial institutions are renewing their business models. Credit risk predictions, monitoring, model reliability and effective loan processing are key to decision making and transparency. In this work, we build binary classifiers based on machine and deep learning models on real data in predicting loan default probability. The top 10 important features from these models are selected and then used in the modelling process to test the stability of binary classifiers by comparing performance on separate data. We observe that tree-based models are more stable than models based on multilayer artificial neural networks. This opens several questions relative to the intensive used of deep learning systems in the enterprises.

[1]  S. Yitzhaki,et al.  A note on the calculation and interpretation of the Gini index , 1984 .

[2]  L. Breiman SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[3]  John Rudolph Raj,et al.  The Impact of Risk Management in Credit Rating Agencies , 2017 .

[4]  Geoffrey E. Hinton Reducing the Dimensionality of Data with Neural , 2008 .

[5]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[6]  Tamás D. Gedeon,et al.  Data Mining of Inputs: Analysing Magnitude and Functional Measures , 1997, Int. J. Neural Syst..

[7]  Antoine Mandel,et al.  Agent-based dynamics in disaggregated growth models , 2010 .

[8]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[9]  Nicola Torelli,et al.  Training and assessing classification rules with imbalanced data , 2012, Data Mining and Knowledge Discovery.

[10]  J. Galindo,et al.  Credit Risk Assessment Using Statistical and Machine Learning: Basic Methodology and Risk Modeling Applications , 2000 .

[11]  Robert Balzer,et al.  A 15 Year Perspective on Automatic Programming , 1985, IEEE Transactions on Software Engineering.

[12]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[13]  A. K. M. Mahbub Morshedb,et al.  Quarterly Review of Economics and Finance , 2010 .

[14]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[15]  Andrea Roli,et al.  A neural network approach for credit risk evaluation , 2008 .

[16]  Arash Bahrammirzaee,et al.  A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems , 2010, Neural Computing and Applications.

[17]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[18]  Shlomo Yitzhaki,et al.  On an Extension of the Gini Inequality Index , 1983 .

[19]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[20]  Sanmay Das,et al.  Risk and Risk Management in the Credit Card Industry , 2015 .

[21]  Dunja Mladenic,et al.  Feature Selection for Unbalanced Class Distribution and Naive Bayes , 1999, ICML.

[22]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[23]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[24]  J. Gastwirth The Estimation of the Lorenz Curve and Gini Index , 1972 .

[25]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[28]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[29]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[30]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[31]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[33]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[34]  P. Mouron La loi pour une République numérique , 2017 .

[35]  A. Lo,et al.  Consumer Credit Risk Models Via Machine-Learning Algorithms , 2010 .

[36]  Robin Genuer,et al.  Random Forests: some methodological insights , 2008, 0811.3619.

[37]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[38]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[39]  Justin A. Sirignano,et al.  Deep Learning for Mortgage Risk , 2016, Journal of Financial Econometrics.

[40]  Yves Deville,et al.  Logic Program Synthesis , 1994, J. Log. Program..

[41]  S. Salini,et al.  Modern Analysis of Customer Surveys: comparison of models and integrated analysis (with discussion) , 2011 .

[42]  L. Breiman CONSISTENCY FOR A SIMPLE MODEL OF RANDOM FORESTS , 2004 .

[43]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[44]  Soushan Wu,et al.  Credit rating analysis with support vector machines and neural networks: a market comparative study , 2004, Decis. Support Syst..