A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification

Credit scoring focuses on the development of empirical models to support the financial decision‐making processes of financial institutions and credit industries. It makes use of applicants' historical data and statistical or machine learning techniques to assess the risk associated with an applicant. However, the historical data may consist of redundant and noisy features that affect the performance of credit scoring models. The main focus of this paper is to develop a hybrid model, combining feature selection and a multilayer ensemble classifier framework, to improve the predictive performance of credit scoring. The proposed hybrid credit scoring model is modeled in three phases. The initial phase constitutes preprocessing and assigns ranks and weights to classifiers. In the next phase, the ensemble feature selection approach is applied to the preprocessed dataset. Finally, in the last phase, the dataset with the selected features is used in a multilayer ensemble classifier framework. In addition, a classifier placement algorithm based on the Choquet integral value is designed, as the classifier placement affects the predictive performance of the ensemble framework. The proposed hybrid credit scoring model is validated on real‐world credit scoring datasets, namely, Australian, Japanese, German‐categorical, and German‐numerical datasets.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Frank Neumann,et al.  Bioinspired computation in combinatorial optimization: algorithms and their computational complexity , 2010, GECCO '12.

[4]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[5]  Eric R. Ziegel,et al.  Engineering Statistics , 2004, Technometrics.

[6]  C ONG,et al.  Building credit scoring models using genetic programming , 2005, Expert Syst. Appl..

[7]  Stjepan Oreski,et al.  Genetic algorithm-based heuristic for feature selection in credit risk assessment , 2014, Expert Syst. Appl..

[8]  Pornwatthana Wongchinsri,et al.  SR-based binary classification in credit scoring , 2017, 2017 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON).

[9]  Daniel Svozil,et al.  Introduction to multi-layer feed-forward neural networks , 1997 .

[10]  Eva Ocelíková,et al.  Multi-criteria decision making methods , 2005 .

[11]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[12]  Yufei Xia,et al.  A novel heterogeneous ensemble credit scoring model based on bstacking approach , 2018, Expert Syst. Appl..

[13]  Damodar Reddy Edla,et al.  Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification , 2018, J. Intell. Fuzzy Syst..

[14]  D. Bauer Constructing Confidence Sets Using Rank Statistics , 1972 .

[15]  KangByeong Ho,et al.  Investigation and improvement of multi-layer perceptron neural networks for credit scoring , 2015 .

[16]  Loretta J. Mester What's the point of credit scoring? , 1997 .

[17]  M. Grabisch The application of fuzzy integrals in multicriteria decision making , 1996 .

[18]  Chiun-Chieh Hsu,et al.  A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model , 2012, Expert Syst. Appl..

[19]  Stefan Lessmann,et al.  Extreme learning machines for credit scoring: An empirical evaluation , 2017, Expert Syst. Appl..

[20]  Kun-Huang Chen,et al.  An improved particle swarm optimization for feature selection , 2011, Intell. Data Anal..

[21]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[22]  田原 Matlab语言的Neural Network Toolbox及其在同步中的应用 , 2008 .

[23]  Weijie Zhao,et al.  Academy of Mathematics and Systems Science, CAS , 2018 .

[24]  Maysam F. Abbod,et al.  A new hybrid ensemble credit scoring model based on classifiers consensus system approach , 2016, Expert Syst. Appl..

[25]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[26]  Deron Liang,et al.  The effect of feature selection on financial distress prediction , 2015, Knowl. Based Syst..

[27]  K. Chandrasekharan,et al.  Bio Inspired Approach as a Problem Solving Technique , 2012 .

[28]  Shouyang Wang,et al.  Rough set and Tabu search based feature selection for credit scoring , 2010, ICCS.

[29]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[30]  Cheng-Lung Huang,et al.  A distributed PSO-SVM hybrid system with feature selection and parameter optimization , 2008, Appl. Soft Comput..

[31]  Frank Neumann,et al.  Bioinspired computation in combinatorial optimization: algorithms and their computational complexity , 2012, GECCO '12.

[32]  Usman Qamar,et al.  HMV: A medical decision support framework using multi-layer classifiers for disease prediction , 2016, J. Comput. Sci..

[33]  Maysam F. Abbod,et al.  Classifiers consensus system approach for credit scoring , 2016, Knowl. Based Syst..

[34]  David C. Yen,et al.  Predicting stock returns by classifier ensembles , 2011, Appl. Soft Comput..

[35]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Philip H. Ramsey Nonparametric Statistical Methods , 1974, Technometrics.

[38]  Hamid Parvin,et al.  Proposing a classifier ensemble framework based on classifier selection and decision tree , 2015, Eng. Appl. Artif. Intell..

[39]  Usman Qamar,et al.  IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework , 2016, J. Biomed. Informatics.

[40]  Ning Chen,et al.  Financial credit risk assessment: a recent review , 2015, Artificial Intelligence Review.

[41]  Jian Ma,et al.  A comparative assessment of ensemble learning for credit scoring , 2011, Expert Syst. Appl..

[42]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[43]  Stephen C. H. Leung,et al.  Vertical bagging decision trees model for credit scoring , 2010, Expert Syst. Appl..

[44]  Ping Yao,et al.  Neighborhood rough set and SVM based hybrid credit scoring classifier , 2011, Expert Syst. Appl..

[45]  Gianluca Antonini,et al.  Subagging for credit scoring models , 2010, Eur. J. Oper. Res..

[46]  José Salvador Sánchez,et al.  Two-level classifier ensembles for credit risk assessment , 2012, Expert Syst. Appl..

[47]  Richard Weber,et al.  Simultaneous feature selection and classification using kernel-penalized support vector machines , 2011, Inf. Sci..

[48]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[49]  Hui Li,et al.  Financial distress prediction using support vector machines: Ensemble vs. individual , 2012, Appl. Soft Comput..

[50]  José Salvador Sánchez,et al.  An insight into the experimental design for credit risk and corporate bankruptcy prediction systems , 2014, Journal of Intelligent Information Systems.

[51]  Carsten Witt,et al.  Bioinspired Computation in Combinatorial Optimization , 2010, Bioinspired Computation in Combinatorial Optimization.

[52]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[53]  Loris Nanni,et al.  An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring , 2009, Expert Syst. Appl..

[54]  S. Kim,et al.  Predicting restaurant financial distress using decision tree and AdaBoosted decision tree models , 2014 .

[55]  Damodar Reddy Edla,et al.  An Efficient Multi-layer Ensemble Framework with BPSOGSA-Based Feature Selection for Credit Scoring Data Analysis , 2018 .

[56]  Manoj Kumar Tiwari,et al.  Computational time reduction for credit scoring: An integrated approach based on support vector machine and stratified sampling method , 2012, Expert Syst. Appl..

[57]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[58]  Michel Grabisch,et al.  A decade of application of the Choquet and Sugeno integrals in multi-criteria decision aid , 2010, Ann. Oper. Res..

[59]  Jie Sun,et al.  Combining B&B-based hybrid feature selection and the imbalance-oriented multiple-classifier ensemble for imbalanced credit risk assessment , 2015 .

[60]  Antanas Verikas,et al.  Hybrid and ensemble-based soft computing techniques in bankruptcy prediction: a survey , 2010, Soft Comput..

[61]  Jian Ma,et al.  Two credit scoring models based on dual strategy ensemble trees , 2012, Knowl. Based Syst..

[62]  Francisco Javier García Castellano,et al.  Expert Systems With Applications , 2022 .

[63]  Dominik Heider,et al.  Compensation of feature selection biases accompanied with improved predictive performance for binary classification by using a novel ensemble feature selection approach , 2016, BioData Mining.

[64]  Ning Chen,et al.  Comparative study of classifier ensembles for cost-sensitive credit risk assessment , 2015, Intell. Data Anal..

[65]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[66]  Chih-Fong Tsai,et al.  Using neural network ensembles for bankruptcy prediction and credit scoring , 2008, Expert Syst. Appl..

[67]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[68]  Jian Ma,et al.  Rough set and scatter search metaheuristic based feature selection for credit scoring , 2012, Expert Syst. Appl..