Two-level classifier ensembles for credit risk assessment

Many techniques have been proposed for credit risk assessment, from statistical models to artificial intelligence methods. During the last few years, different approaches to classifier ensembles have successfully been applied to credit scoring problems, demonstrating to be generally more accurate than single prediction models. The present paper goes one step beyond by introducing composite ensembles that jointly use different strategies for diversity induction. Accordingly, the combination of data resampling algorithms (bagging and AdaBoost) and attribute subset selection methods (random subspace and rotation forest) for the construction of composite ensembles is explored with the aim of improving the prediction performance. The experimental results and statistical tests show that this new two-level classifier ensemble constitutes an appropriate solution for credit scoring problems, performing better than the traditional single ensembles and very significantly better than individual classifiers.

[1]  David J. Hand,et al.  Statistical Classification Methods in Consumer Credit Scoring: a Review , 1997 .

[2]  C ONG,et al.  Building credit scoring models using genetic programming , 2005, Expert Syst. Appl..

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[5]  L. Kuncheva,et al.  Combining classifiers: Soft computing solutions. , 2001 .

[6]  Bart Baesens,et al.  Credit rating prediction using Ant Colony Optimization , 2010, J. Oper. Res. Soc..

[7]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[8]  Michael J. Pazzani,et al.  Error reduction through learning multiple descriptions , 2004, Machine Learning.

[9]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[12]  Jian Ma,et al.  A comparative assessment of ensemble learning for credit scoring , 2011, Expert Syst. Appl..

[13]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[14]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[15]  D. J. Hand,et al.  Good practice in retail credit scorecard assessment , 2005, J. Oper. Res. Soc..

[16]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[17]  Jong-Seok Lee,et al.  When Costs Are Unequal and Unknown: A Subtree Grafting Approach for Unbalanced Data Classification , 2011, Decis. Sci..

[18]  H. Sabzevari,et al.  A comparison between statistical and Data Mining methods for credit scoring in case of limited available data , 2007 .

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[21]  Wenjia Wang,et al.  On diversity and accuracy of homogeneous and heterogeneous ensembles , 2007, Int. J. Hybrid Intell. Syst..

[22]  W. Pietruszkiewicz,et al.  Dynamical systems and nonlinear Kalman filtering applied in classification , 2008, 2008 7th IEEE International Conference on Cybernetic Intelligent Systems.

[23]  Eric Rosenberg,et al.  Quantitative Methods in Credit Management: A Survey , 1994, Oper. Res..

[24]  David West,et al.  Neural network ensemble strategies for financial decision applications , 2005, Comput. Oper. Res..

[25]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[26]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Huseyin Ince,et al.  A comparison of data mining techniques for credit scoring in banking: A managerial perspective , 2009 .

[29]  Hussein A. Abdou,et al.  Credit Scoring, Statistical Techniques and Evaluation Criteria: A Review of the Literature , 2011, Intell. Syst. Account. Finance Manag..

[30]  Soushan Wu,et al.  Credit rating analysis with support vector machines and neural networks: a market comparative study , 2004, Decis. Support Syst..

[31]  Bhekisipho Twala,et al.  Multiple classifier application to credit risk assessment , 2010, Expert Syst. Appl..

[32]  Constantin Zopounidis,et al.  Model combination for credit risk assessment: A stacked generalization approach , 2007, Ann. Oper. Res..

[33]  Chiun-Chieh Hsu,et al.  A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model , 2012, Expert Syst. Appl..

[34]  Edward I. Altman,et al.  Managing Credit Risk: The Great Challenge for the Global Financial Markets , 2008 .