A multi-objective approach for profit-driven feature selection in credit scoring

Abstract In credit scoring, feature selection aims at removing irrelevant data to improve the performance of the scorecard and its interpretability. Standard techniques treat feature selection as a single-objective task and rely on statistical criteria such as correlation. Recent studies suggest that using profit-based indicators may improve the quality of scoring models for businesses. We extend the use of profit measures to feature selection and develop a multi-objective wrapper framework based on the NSGA-II genetic algorithm with two fitness functions: the Expected Maximum Profit (EMP) and the number of features. Experiments on multiple credit scoring data sets demonstrate that the proposed approach develops scorecards that can yield a higher expected profit using fewer features than conventional feature selection strategies.

[1]  Andrew Hunter,et al.  Selecting features in neurofuzzy modelling by multiobjective genetic algorithms , 1999 .

[2]  A.F. Gomez-Skarmeta,et al.  An evolutionary algorithm for constrained multi-objective optimization , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[3]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[4]  Miguel Cazorla,et al.  Feature selection, mutual information, and the classification of high-dimensional patterns , 2008, Pattern Analysis and Applications.

[5]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[6]  Ki Mun Jung,et al.  When to rebuild or when to adjust scorecards , 2015, J. Oper. Res. Soc..

[7]  Harald Scheule,et al.  Credit risk analytics : measurement techniques, applications, and examples in SAS , 2016 .

[8]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[9]  Mengjie Zhang,et al.  Pareto front feature selection based on artificial bee colony optimization , 2018, Inf. Sci..

[10]  Elizabeth Mays,et al.  Credit Scoring for Risk Managers: The Handbook for Lenders , 2003 .

[11]  Carlos Serrano-Cinca,et al.  The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending , 2016, Decis. Support Syst..

[12]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[13]  Hassan Ghasemzadeh,et al.  Cost-sensitive feature selection for on-body sensor localization , 2014, UbiComp Adjunct.

[14]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[15]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[16]  Jonathan N. Crook,et al.  Recent developments in consumer credit risk assessment , 2007, Eur. J. Oper. Res..

[17]  Jian Cheng,et al.  Multi-Objective Particle Swarm Optimization Approach for Cost-Based Feature Selection in Classification , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  D. J. Hand,et al.  Good practice in retail credit scorecard assessment , 2005, J. Oper. Res. Soc..

[19]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[20]  Fakhri Karray,et al.  Multi-objective Feature Selection with NSGA II , 2007, ICANNGA.

[21]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[22]  Qinghua Hu,et al.  Feature selection with test cost constraint , 2012, ArXiv.

[23]  Bart Baesens,et al.  A Novel Profit Maximizing Metric for Measuring Classification Performance of Customer Churn Prediction Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[24]  Bart Baesens,et al.  Profit-based feature selection using support vector machines - General framework and an application for customer retention , 2015, Appl. Soft Comput..

[25]  Sebastián Maldonado,et al.  Integrated framework for profit-based feature selection and SVM classification in credit scoring , 2017, Decis. Support Syst..

[26]  Verónica Bolón-Canedo,et al.  Recent advances and emerging challenges of feature selection in the context of big data , 2015, Knowl. Based Syst..

[27]  Concha Bielza,et al.  A Survey of L1 Regression , 2013 .

[28]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[29]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[30]  Richard Weber,et al.  Granting and managing loans for micro-entrepreneurs: New developments and practical experiences , 2013, Eur. J. Oper. Res..

[31]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[32]  Sebastián Maldonado,et al.  Cost-based feature selection for Support Vector Machines: An application in credit scoring , 2017, Eur. J. Oper. Res..

[33]  João Miguel da Costa Sousa,et al.  Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients , 2013, Appl. Soft Comput..

[34]  Ray Tsaih,et al.  Credit scoring system for small business loans , 2004, Decis. Support Syst..

[35]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[36]  Steven Finlay,et al.  Credit scoring for profitability objectives , 2010, Eur. J. Oper. Res..

[37]  Ignacio Ponzoni,et al.  Multi‐Objective Feature Selection in QSAR Using a Machine Learning Approach , 2009 .

[38]  Joe Whittaker,et al.  Quantile regression for modelling distributions of profit and loss , 2007, Eur. J. Oper. Res..

[39]  Hongnian Yu,et al.  Mutual information based input feature selection for classification problems , 2012, Decis. Support Syst..

[40]  Amitabha Mukerjee,et al.  Multi–objective Evolutionary Algorithms for the Risk–return Trade–off in Bank Loan Management , 2002 .

[41]  Luiz Eduardo Soares de Oliveira,et al.  Feature selection using multi-objective genetic algorithms for handwritten digit recognition , 2002, Object recognition supported by user interaction for service robots.

[42]  Emilio Carrizosa,et al.  Cost-sensitive Feature Selection for Support Vector Machines , 2019, Comput. Oper. Res..

[43]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[44]  Bart Baesens,et al.  Development and application of consumer credit scoring models using profit-based classification measures , 2014, Eur. J. Oper. Res..