An Efficient Multi-layer Ensemble Framework with BPSOGSA-Based Feature Selection for Credit Scoring Data Analysis

Credit scoring is extensively used by credit industries and financial institutions for financial decision-making. It is a way to assess the risk associated with an applicant based on historical data. However, the historical data may have large number of redundant and noisy features which could affect performance of credit scoring models. Main focus of this paper is to develop a hybrid credit scoring model by combining the feature selection and multi-layer ensemble classifier framework to improve the prediction performance of credit scoring model. The proposed hybrid credit scoring model uses hybrid binary particle swarm optimization and gravitational search algorithm (BPSOGSA) for feature selection and multi-layer ensemble classifier framework with five heterogeneous classifiers. A novel V-shaped transfer function for BPSOGSA is also designed for effective feature selection, which is used to transform the continuous search space to binary search space. Also, a novel fitness function for BPSOGSA is proposed to calculate the fitness value for each search agent. Further, multi-layer ensemble classifier framework along with a novel aggregation function is designed based on generalized convex function. The proposed hybrid credit scoring model is validated using Australian, German-categorical, German-numerical and Japanese credit scoring datasets. The experimental results on all the datasets demonstrate that the proposed credit scoring model outperforms other methods such as random forest and ensemble frameworks, namely majority voting, layered majority voting, weighted voting and layered weighted voting in terms of accuracy, sensitivity, G-measure and ROC characteristics.

[1]  Stjepan Oreski,et al.  Genetic algorithm-based heuristic for feature selection in credit risk assessment , 2014, Expert Syst. Appl..

[2]  Xin-She Yang,et al.  Binary bat algorithm , 2013, Neural Computing and Applications.

[3]  Marian B. Gorzalczany,et al.  A multi-objective genetic optimization for fast, fuzzy rule-based credit classification with balanced accuracy and interpretability , 2016, Appl. Soft Comput..

[4]  Richard Weber,et al.  Simultaneous feature selection and classification using kernel-penalized support vector machines , 2011, Inf. Sci..

[5]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[6]  Jian Ma,et al.  Two credit scoring models based on dual strategy ensemble trees , 2012, Knowl. Based Syst..

[7]  Francisco Javier García Castellano,et al.  Expert Systems With Applications , 2022 .

[8]  Yao Ping,et al.  Neighborhood rough set and SVM based hybrid credit scoring classifier , 2011 .

[9]  Hamid Parvin,et al.  Proposing a classifier ensemble framework based on classifier selection and decision tree , 2015, Eng. Appl. Artif. Intell..

[10]  Usman Qamar,et al.  IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework , 2016, J. Biomed. Informatics.

[11]  Shouyang Wang,et al.  Rough set and Tabu search based feature selection for credit scoring , 2010, ICCS.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Jian Ma,et al.  A comparative assessment of ensemble learning for credit scoring , 2011, Expert Syst. Appl..

[14]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[15]  Maysam F. Abbod,et al.  Classifiers consensus system approach for credit scoring , 2016, Knowl. Based Syst..

[16]  Cheng-Lung Huang,et al.  A distributed PSO-SVM hybrid system with feature selection and parameter optimization , 2008, Appl. Soft Comput..

[17]  David C. Yen,et al.  Predicting stock returns by classifier ensembles , 2011, Appl. Soft Comput..

[18]  Leandro dos Santos Coelho,et al.  Binary optimization using hybrid particle swarm optimization and gravitational search algorithm , 2014, Neural Computing and Applications.

[19]  Filip Rudzinski,et al.  A multi-objective genetic optimization of interpretability-oriented fuzzy rule-based classifiers , 2016, Appl. Soft Comput..

[20]  R. Eberhart,et al.  Empirical study of particle swarm optimization , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[21]  Gianluca Antonini,et al.  Subagging for credit scoring models , 2010, Eur. J. Oper. Res..

[22]  Hao Dong,et al.  An improved particle swarm optimization for feature selection , 2011 .

[23]  Usman Qamar,et al.  HMV: A medical decision support framework using multi-layer classifiers for disease prediction , 2016, J. Comput. Sci..

[24]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[25]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[26]  Jian Ma,et al.  Rough set and scatter search metaheuristic based feature selection for credit scoring , 2012, Expert Syst. Appl..

[27]  Daniel Svozil,et al.  Introduction to multi-layer feed-forward neural networks , 1997 .

[28]  Hossein Nezamabadi-pour,et al.  BGSA: binary gravitational search algorithm , 2010, Natural Computing.

[29]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[30]  Boris Delibasic,et al.  A case-based reasoning model that uses preference theory functions for credit scoring , 2012, Expert Syst. Appl..

[31]  Allan G. Bluman,et al.  Elementary Statistics: A Step By Step Approach , 1980 .

[32]  Chiun-Chieh Hsu,et al.  A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model , 2012, Expert Syst. Appl..

[33]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[34]  Maysam F. Abbod,et al.  A new hybrid ensemble credit scoring model based on classifiers consensus system approach , 2016, Expert Syst. Appl..

[35]  Deron Liang,et al.  The effect of feature selection on financial distress prediction , 2015, Knowl. Based Syst..

[36]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[37]  Dong Yu,et al.  The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  Andrew Lewis,et al.  S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization , 2013, Swarm Evol. Comput..

[39]  Hossein Nezamabadi-pour,et al.  GSA: A Gravitational Search Algorithm , 2009, Inf. Sci..

[40]  David G. Stork,et al.  Pattern Classification , 1973 .

[41]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.