A comparative assessment of ensemble learning for credit scoring

Both statistical techniques and Artificial Intelligence (AI) techniques have been explored for credit scoring, an important finance activity. Although there are no consistent conclusions on which ones are better, recent studies suggest combining multiple classifiers, i.e., ensemble learning, may have a better performance. In this study, we conduct a comparative assessment of the performance of three popular ensemble methods, i.e., Bagging, Boosting, and Stacking, based on four base learners, i.e., Logistic Regression Analysis (LRA), Decision Tree (DT), Artificial Neural Network (ANN) and Support Vector Machine (SVM). Experimental results reveal that the three ensemble methods can substantially improve individual base learners. In particular, Bagging performs better than Boosting across all credit datasets. Stacking and Bagging DT in our experiments, get the best performance in terms of average accuracy, type I error and type II error.

[1]  Ralf Stecking,et al.  Support vector machines for classifying and describing credit applicants: detecting typical and critical regions , 2005, J. Oper. Res. Soc..

[2]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[3]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[4]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[5]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[6]  L. Thomas A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers , 2000 .

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[8]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Vijay S. Desai,et al.  A comparison of neural networks and linear scoring models in the credit union environment , 1996 .

[10]  David J. Hand,et al.  Statistical Classification Methods in Consumer Credit Scoring: a Review , 1997 .

[11]  Ingoo Han,et al.  A case-based approach using inductive indexing for corporate bond rating , 2001, Decis. Support Syst..

[12]  Soushan Wu,et al.  Credit rating analysis with support vector machines and neural networks: a market comparative study , 2004, Decis. Support Syst..

[13]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[14]  J. Friedman Multivariate adaptive regression splines , 1990 .

[15]  B.V. Dasarathy,et al.  A composite classifier system design: Concepts and methodology , 1979, Proceedings of the IEEE.

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Kin Keung Lai,et al.  Credit risk assessment with a multistage neural network ensemble learning approach , 2008, Expert Syst. Appl..

[18]  Chihli Hung,et al.  A selective ensemble based on expected probabilities for bankruptcy prediction , 2009, Expert Syst. Appl..

[19]  Terry Windeatt,et al.  Decision Tree Simplification For Classifier Ensembles , 2004, Int. J. Pattern Recognit. Artif. Intell..

[20]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[21]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[22]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[23]  Alan K. Reichert,et al.  An Examination of the Conceptual Issues Involved in Developing Credit-Scoring Models , 1983 .

[24]  Gordon V. Karels,et al.  Multivariate Normality and Forecasting of Business Bankruptcy , 1987 .

[25]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..