Two credit scoring models based on dual strategy ensemble trees

Decision tree (DT) is one of the most popular classification algorithms in data mining and machine learning. However, the performance of DT based credit scoring model is often relatively poorer than other techniques. This is mainly due to two reasons: DT is easily affected by (1) the noise data and (2) the redundant attributes of data under the circumstance of credit scoring. In this study, we propose two dual strategy ensemble trees: RS-Bagging DT and Bagging-RS DT, which are based on two ensemble strategies: bagging and random subspace, to reduce the influences of the noise data and the redundant attributes of data and to get the relatively higher classification accuracy. Two real world credit datasets are selected to demonstrate the effectiveness and feasibility of proposed methods. Experimental results reveal that single DT gets the lowest average accuracy among five single classifiers, i.e., Logistic Regression Analysis (LRA), Linear Discriminant Analysis (LDA), Multi-layer Perceptron (MLP) and Radial Basis Function Network (RBFN). Moreover, RS-Bagging DT and Bagging-RS DT get the better results than five single classifiers and four popular ensemble classifiers, i.e., Bagging DT, Random Subspace DT, Random Forest and Rotation Forest. The results show that RS-Bagging DT and Bagging-RS DT can be used as alternative techniques for credit scoring.

[1]  Defu Zhang,et al.  A Decision Tree Scoring Model Based on Genetic Algorithm and K-Means Algorithm , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.

[2]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[3]  Yi Jiang,et al.  A New Credit Scoring Method Based on Rough Sets and Decision Tree , 2008, PAKDD.

[4]  Ingoo Han,et al.  A case-based approach using inductive indexing for corporate bond rating , 2001, Decis. Support Syst..

[5]  Alan K. Reichert,et al.  An Examination of the Conceptual Issues Involved in Developing Credit-Scoring Models , 1983 .

[6]  Vijay S. Desai,et al.  A comparison of neural networks and linear scoring models in the credit union environment , 1996 .

[7]  David J. Hand,et al.  Statistical Classification Methods in Consumer Credit Scoring: a Review , 1997 .

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[12]  Chihli Hung,et al.  A selective ensemble based on expected probabilities for bankruptcy prediction , 2009, Expert Syst. Appl..

[13]  L. Thomas A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers , 2000 .

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[15]  J. Friedman Multivariate adaptive regression splines , 1990 .

[16]  Jing Xu,et al.  Development of a KBS for managing bank loan risk , 2001, Knowl. Based Syst..

[17]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[18]  Ralf Stecking,et al.  Support vector machines for classifying and describing credit applicants: detecting typical and critical regions , 2005, J. Oper. Res. Soc..

[19]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[20]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[21]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Gordon V. Karels,et al.  Multivariate Normality and Forecasting of Business Bankruptcy , 1987 .

[24]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[25]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[28]  Ian Witten,et al.  Data Mining , 2000 .

[29]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[30]  Chih-Fong Tsai,et al.  Feature selection in bankruptcy prediction , 2009, Knowl. Based Syst..

[31]  J. Stuart Aitken,et al.  Multiple algorithms for fraud detection , 2000, Knowl. Based Syst..

[32]  Soushan Wu,et al.  Credit rating analysis with support vector machines and neural networks: a market comparative study , 2004, Decis. Support Syst..

[33]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[34]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[35]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..