A novel multi-stage hybrid model with enhanced multi-population niche genetic algorithm: An application in credit scoring

Abstract In recent years, artificial intelligence and machine learning technology have made great progress and development. Various novel models have been constructed to enhance prediction performance of binary classification from different aspects. Credit scoring model is a typical application of artificial intelligence and machine learning technology. In this study, we propose a novel multi-stage hybrid model, which combines feature selection and classifier selection to obtain optimal feature subset and optimal classifier subset, then uses classifier ensemble to improve the prediction performance based on the two optimal subsets mentioned above. We also extend genetic algorithm, i.e., propose an enhanced multi-population niche genetic algorithm (EMPNGA), to improve the ability of optimization effectively by enhancing the selection, crossover, and mutation steps, and adding niche and migration steps. Furthermore, EMPNGA is applied to combine several filter methods and priori knowledge in feature selection and classifier selection respectively to further increase the search efficiency. The proposed model is applied to credit scoring to verify its prediction performance. Finally, five datasets and four evaluation metrics are applied in the experiment. The experimental results confirm that the performance of proposed model is superior to the other comparative models, proving that this study is of significance and effectiveness.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[3]  Yao Ping,et al.  Neighborhood rough set and SVM based hybrid credit scoring classifier , 2011 .

[4]  Soushan Wu,et al.  Credit rating analysis with support vector machines and neural networks: a market comparative study , 2004, Decis. Support Syst..

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[7]  Jakub M. Tomczak,et al.  Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction , 2016, Expert Syst. Appl..

[8]  Jian Ma,et al.  A comparative assessment of ensemble learning for credit scoring , 2011, Expert Syst. Appl..

[9]  Shuai Zhang,et al.  A novel ensemble method for credit scoring: Adaption of different imbalance ratios , 2018, Expert Syst. Appl..

[10]  Krzysztof Michalak,et al.  Feature selection in corporate credit rating prediction , 2013, Knowl. Based Syst..

[11]  Di Wang,et al.  A hybrid system with filter approach and multiple population genetic algorithm for feature selection in credit scoring , 2018, J. Comput. Appl. Math..

[12]  Feng-Chia Li,et al.  Combination of feature selection approaches with SVM in credit scoring , 2010, Expert Syst. Appl..

[13]  Rory A. Fisher,et al.  Studies in crop variation. I. An examination of the yield of dressed grain from Broadbalk , 1921, The Journal of Agricultural Science.

[14]  Stephen C. H. Leung,et al.  Vertical bagging decision trees model for credit scoring , 2010, Expert Syst. Appl..

[15]  Christoforos Anagnostopoulos,et al.  A better Beta for the H measure of classification performance , 2012, Pattern Recognit. Lett..

[16]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[17]  Yufei Xia,et al.  A novel heterogeneous ensemble credit scoring model based on bstacking approach , 2018, Expert Syst. Appl..

[18]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[19]  Stjepan Oreski,et al.  Genetic algorithm-based heuristic for feature selection in credit risk assessment , 2014, Expert Syst. Appl..

[20]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[21]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[22]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[23]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[24]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[25]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[26]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[27]  Maysam F. Abbod,et al.  Classifiers consensus system approach for credit scoring , 2016, Knowl. Based Syst..

[28]  Chih-Hsun Chou,et al.  Hybrid genetic algorithm and fuzzy clustering for bankruptcy prediction , 2017, Appl. Soft Comput..

[29]  Wenhuang Liu,et al.  Applications of classification trees to consumer credit scoring methods in commercial banks , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[30]  Maysam F. Abbod,et al.  A new hybrid ensemble credit scoring model based on classifiers consensus system approach , 2016, Expert Syst. Appl..

[31]  Deron Liang,et al.  The effect of feature selection on financial distress prediction , 2015, Knowl. Based Syst..

[32]  Stefan Lessmann,et al.  Approaches for credit scorecard calibration: An empirical analysis , 2017, Knowl. Based Syst..

[33]  Sebastián Maldonado,et al.  Cost-based feature selection for Support Vector Machines: An application in credit scoring , 2017, Eur. J. Oper. Res..

[34]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[35]  Terry Harris,et al.  Credit scoring using the clustered support vector machine , 2015, Expert Syst. Appl..

[36]  Ning Chen,et al.  A genetic algorithm-based approach to cost-sensitive bankruptcy prediction , 2011, Expert Syst. Appl..

[37]  Steven Finlay,et al.  Multiple classifier architectures and their application to credit risk assessment , 2011, Eur. J. Oper. Res..

[38]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..