Credit Scoring Model based on Kernel Density Estimation and Support Vector Machine for Group Feature Selection

A credit scoring model (CSM) is a tool that is typically used in the decision-making process of accepting or rejecting a loan. The selection of an appropriate feature subset is crucial for the credit scoring model. In this paper, we propose a novel framework to improve the performance of this task. First, the kernel density estimation (KDE) is used to construct feature groups in order to combine similar features and reduce wasteful computational workload. Second, the correlation among features is not only simply similar, but also other meaningful relations, such as part-of, has-a etc. Therefore, we calculated the corresponding group scores for each feature group, and then obtained the corresponding radar map according to the group score. The purpose is to help improve the quality of the final selected feature subset and to get the specific semantics of each feature group. Finally, each feature group is selected as a separate entity for feature selection to obtain the optimal feature subset. All features are treated as one-dimensional vectors. The support vector machine (SVM) algorithm is used for training and prediction, and corresponding calculations are performed to obtain a total credit score. Extensive experiments on the UCI benchmark database show the advantages and effectiveness of our proposed algorithm.

[1]  Terry Harris,et al.  Credit scoring using the clustered support vector machine , 2015, Expert Syst. Appl..

[2]  Bhanukiran Vinzamuri,et al.  Feature Grouping Using Weighted l1 Norm for High-Dimensional Data , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[3]  Shashi Dahiya,et al.  Credit Scoring Using Ensemble of Various Classifiers on Reduced Feature Set , 2015 .

[4]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[5]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[6]  Robert D. Nowak,et al.  Classification With the Sparse Group Lasso , 2016, IEEE Transactions on Signal Processing.

[7]  Chris H. Q. Ding,et al.  Consensus group stable feature selection , 2009, KDD.

[8]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[9]  Qing Wang,et al.  Towards precise classification of cancers based on robust gene functional expression profiles , 2005, BMC Bioinformatics.

[10]  Maysam F. Abbod,et al.  A new hybrid ensemble credit scoring model based on classifiers consensus system approach , 2016, Expert Syst. Appl..

[11]  Paulius Danenas,et al.  Selection of Support Vector Machines based classifiers for credit risk domain , 2015, Expert Syst. Appl..

[12]  Bhanukiran Vinzamuri,et al.  Cox Regression with Correlation Based Regularization for Electronic Health Records , 2013, 2013 IEEE 13th International Conference on Data Mining.

[13]  Sebastián Maldonado,et al.  Integrated framework for profit-based feature selection and SVM classification in credit scoring , 2017, Decis. Support Syst..

[14]  Jieping Ye,et al.  Feature grouping and selection over an undirected graph , 2012, KDD.

[15]  Chris H. Q. Ding,et al.  Stable feature selection via dense feature groups , 2008, KDD.

[16]  Dongxiao Zhu,et al.  Multinomial classification with class-conditional overlapping sparse feature groups , 2018, Pattern Recognit. Lett..

[17]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[18]  Belén Melián-Batista,et al.  High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach , 2016, Inf. Sci..

[19]  Leon Wenliang Zhong,et al.  Efficient Sparse Modeling With Automatic Feature Grouping , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Jian Ma,et al.  Rough set and scatter search metaheuristic based feature selection for credit scoring , 2012, Expert Syst. Appl..