Island Model Genetic Algorithm for Feature Selection in Non-Traditional Credit Risk Evaluation

As digital infrastructure expands in new regions of the globe, developing ways to include more diverse information in financial decisions is important. However, making use of novel data sources requires developing methods to evaluate credit with diverse and complex datasets with missing information, dynamic patterns and relationships with decision recommendations, and larger feature sets. Feature selection is one approach that can support the application of machine learning to dynamically build models for credit evaluation with complex data. Genetic algorithms (GAs) have been proved to reach good performance in other research, with high computation cost though. In this paper, we review existing GA approaches and test and develop a novel method based on niching and the use of subpopulations with different data for fitness evaluation. This formulation allows less computation cost, even with better prediction performance in feature selection. In further experiments, we compare the proposed GA-based feature selection approaches in four traditional credit datasets and a novel emerging market dataset from China. The results indicate that the advanced GA-based feature selection methods perform more effectively.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Chih-Chou Chiu,et al.  A Bayesian latent variable model with classification and regression tree approach for behavior and credit scoring , 2012, Knowl. Based Syst..

[3]  Lean Yu,et al.  Social credit: a comprehensive literature review , 2015 .

[4]  Feng-Chia Li,et al.  Combination of feature selection approaches with SVM in credit scoring , 2010, Expert Syst. Appl..

[5]  Franco Varetto Genetic algorithms applications in the analysis of insolvency risk , 1998 .

[6]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[7]  Jure Zupan,et al.  Consumer Credit Scoring Models with Limited Data , 2007, Expert Syst. Appl..

[8]  Han Li-yan,et al.  Credit Scoring Model Hybridizing Artificial Intelligence with Logistic Regression , 2013 .

[9]  Cheng-Lung Huang,et al.  A distributed PSO-SVM hybrid system with feature selection and parameter optimization , 2008, Appl. Soft Comput..

[10]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[11]  Sabine Van Huffel,et al.  Preoperative prediction of malignancy of ovarian tumors using least squares support vector machines , 2003, Artif. Intell. Medicine.

[12]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[13]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[14]  John J. Grefenstette,et al.  Optimization of Control Parameters for Genetic Algorithms , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  William Edward Henley,et al.  Statistical aspects of credit scoring , 1995 .

[16]  Lawrence Davis,et al.  Training Feedforward Neural Networks Using Genetic Algorithms , 1989, IJCAI.

[17]  Yu Zhong,et al.  An Overview of Personal Credit Scoring: Techniques and Future Work , 2012 .

[18]  Chih-Chou Chiu,et al.  Credit scoring using the hybrid neural discriminant technique , 2002, Expert Syst. Appl..

[20]  D. Hand,et al.  A k-nearest-neighbour classifier for assessing consumer credit risk , 1996 .

[21]  Y. Liu,et al.  Data mining feature selection for credit scoring models , 2005, J. Oper. Res. Soc..

[22]  Witold Jacak,et al.  Identification of cancer diagnosis estimation models using evolutionary algorithms: a case study for breast cancer, melanoma, and cancer in the respiratory system , 2011, GECCO.

[23]  Melody Y. Kiang,et al.  Managerial Applications of Neural Networks: The Case of Bank Failure Predictions , 1992 .

[24]  Daniel Björkegren,et al.  Behavior Revealed in Mobile Phone Usage Predicts Loan Repayment , 2017, The World Bank Economic Review.

[25]  Pier Luca Lanzi,et al.  Fast feature selection with genetic algorithms: a filter approach , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[26]  Ralf Stecking,et al.  Variable Subset Selection for Credit Scoring with Support Vector Machines , 2005, OR.

[27]  Agma J. M. Traina,et al.  Improving the ranking quality of medical image retrieval using a genetic feature selection method , 2011, Decis. Support Syst..

[28]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[29]  Sung-Bae Cho,et al.  Efficient huge-scale feature selection with speciated genetic algorithm , 2005 .

[30]  Myong Kee Jeong,et al.  An evolutionary algorithm with the partial sequential forward floating search mutation for large-scale feature selection problems , 2015, J. Oper. Res. Soc..

[31]  Nicolás García-Pedrajas,et al.  Evolving Output Codes for Multiclass Problems , 2008, IEEE Transactions on Evolutionary Computation.

[32]  Deron Liang,et al.  The effect of feature selection on financial distress prediction , 2015, Knowl. Based Syst..

[33]  Edward I. Altman,et al.  Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience) , 1994 .

[34]  George Ioannou,et al.  A two-stage dynamic credit scoring model, based on customers’ profile and time horizon , 2008 .

[35]  Chrysanthos Dellarocas,et al.  Credit Scoring with Social Network Data , 2014 .

[36]  Fakhri Karray,et al.  Distributed Genetic Algorithm with Bi-Coded Chromosomes and a New Evaluation Function for Features Selection , 2006, 2006 IEEE International Conference on Evolutionary Computation.