Three local search-based methods for feature selection in credit scoring

Credit scoring is a crucial problem in both finance and banking. In this paper, we tackle credit scoring as a classification problem where three local search-based methods are studied for feature selection. The feature selection is an interesting technique that can be launched before the data classification task. It permits to keep only the relevant variables and eliminate the redundant ones which enhances the classification accuracy. We study the local search method (LS), the stochastic local search method (SLS) and the variable neighborhood search method (VNS) for feature selection. Then, we combine these methods with the support vector machine (SVM) classifier to find the best described model from a dataset with the correct class variable. The proposed methods (LS+SVM, SLS+SVM and VNS+SVM) are evaluated on both German and Australian credit datasets and compared with some well-known classifiers. The numerical results are promising and show a good performance in favor of our methods.

[1]  Dalila Boughaci,et al.  A memetic algorithm with support vector machine for feature selection and classification , 2015, Memetic Computing.

[2]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[3]  Phil Goddard,et al.  Optimal Feature Selection Using a Quantum Annealer , 2018 .

[4]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[5]  Pierre Hansen,et al.  Variable Neighborhood Search , 2018, Handbook of Heuristics.

[6]  João Gama,et al.  A new dynamic modeling framework for credit risk assessment , 2016, Expert Syst. Appl..

[7]  D. Hand,et al.  A k-nearest-neighbour classifier for assessing consumer credit risk , 1996 .

[8]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[9]  Habiba Drias,et al.  A memetic algorithm for the optimal winner determination problem , 2009, Soft Comput..

[10]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[11]  Pierre Hansen,et al.  Variable neighborhood search: Principles and applications , 1998, Eur. J. Oper. Res..

[12]  David J. Hand,et al.  Statistical Classification Methods in Consumer Credit Scoring: a Review , 1997 .

[13]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[14]  Jonathan Crook,et al.  Support vector machines for credit scoring and discovery of significant features , 2009, Expert Syst. Appl..

[15]  Kalyanmoy Deb,et al.  Messy Genetic Algorithms: Motivation, Analysis, and First Results , 1989, Complex Syst..

[16]  Vijay S. Desai,et al.  A comparison of neural networks and linear scoring models in the credit union environment , 1996 .

[17]  Pierre Hansen,et al.  Variable Neighbourhood Search , 2003 .

[18]  S. Archana,et al.  Survey of Classification Techniques in Data Mining , 2014 .

[19]  Jianping Li,et al.  An evolution strategy-based multiple kernels multi-criteria programming approach: The case of credit decision making , 2011, Decis. Support Syst..

[20]  Max Rounds,et al.  Optimal feature selection in credit scoring and classification using a quantum annealer , 2017 .

[21]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[22]  Bishwajit Chakraborty,et al.  Genetic algorithm with fuzzy fitness function for feature selection , 2002, Industrial Electronics, 2002. ISIE 2002. Proceedings of the 2002 IEEE International Symposium on.

[23]  Dalila Boughaci,et al.  A Cooperative Classification System for Credit Scoring , 2019 .

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  Dalila Boughaci,et al.  Metaheuristic Approaches for the Winner Determination Problem in Combinatorial Auction , 2013, Artificial Intelligence, Evolutionary Computing and Metaheuristics.

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Pablo Moscato,et al.  On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts : Towards Memetic Algorithms , 1989 .

[28]  Joaquín Abellán,et al.  Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring , 2014, Expert Syst. Appl..

[29]  Kay Chen Tan,et al.  A hybrid evolutionary algorithm for attribute selection in data mining , 2009, Expert Syst. Appl..

[30]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[31]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[32]  Colin Campbell,et al.  Learning with Support Vector Machines , 2011, Learning with Support Vector Machines.

[34]  J. Wiginton A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior , 1980, Journal of Financial and Quantitative Analysis.

[35]  Xin-She Yang Harmony Search as a Metaheuristic Algorithm , 2009 .

[36]  Habiba Drias,et al.  Local Search Methods for the Optimal Winner Determination Problem in Combinatorial Auctions , 2010, J. Math. Model. Algorithms.

[37]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[38]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[39]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[40]  Hussein A. Abdou Genetic programming for credit scoring: The case of Egyptian public sector banks , 2009, Expert Syst. Appl..

[41]  Thomas Stützle,et al.  Stochastic Local Search: Foundations & Applications , 2004 .

[42]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[43]  Pier Luca Lanzi,et al.  Fast feature selection with genetic algorithms: a filter approach , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[44]  Dalila Boughaci,et al.  Hybrid Harmony Search Combined with Stochastic Local Search for Feature Selection , 2015, Neural Processing Letters.

[45]  Nenad Mladenovic,et al.  Variable neighbourhood decomposition search for 0-1 mixed integer programs , 2009, Comput. Oper. Res..

[46]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[47]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[48]  So Young Sohn,et al.  Technology Credit Scoring Based on a Quantification Method , 2017 .