Credit scoring using ensemble classification based on variable weighting clustering

Credit scoring plays an important role in financial institutions and debt based crowdfunding platforms as well as peer to peer lending platforms. In the last few years, adopting ensemble methods for credit scoring has become much more popular. However, the performance of ensemble methods is easily affected by the parameter settings and the number of base classifiers. Ensemble classification based on clustering is able to determine the best number of base classifiers automatically by clustering and find optimal parameter settings for base classifiers by training them individually on the training subsets combined by clusters. By this way, the adverse effect of manually setting the parameters and the number of base classifiers can be avoided. However, the different contributions of attributes to the distance metrics are not considered in conventional clustering methods, which may decrease the performance of ensemble classifiers based on them. Moreover, unbalanced training subsets decrease the performance of base classifiers, which results in the bad performance of ensemble classifiers. In our approach, to address the above problems, we first assign different weights to different variables when measuring the distance between two instances in the clustering step, and then adopt Subagging resampling method to deal with unbalanced training subsets in the training process. Experimental results show that our approach can improve the performance of the ensemble classifier.

[1]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[2]  Ning Xiong,et al.  Multi-sensor management for information fusion: issues and approaches , 2002, Inf. Fusion.

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Vijay S. Desai,et al.  A comparison of neural networks and linear scoring models in the credit union environment , 1996 .

[5]  Jonathan Crook,et al.  Support vector machines for credit scoring and discovery of significant features , 2009, Expert Syst. Appl..

[6]  Jian Ma,et al.  A comparative assessment of ensemble learning for credit scoring , 2011, Expert Syst. Appl..

[7]  E. Altman,et al.  Managing Credit Risk: The Next Great Financial Challenge , 1998 .

[8]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Xiang Hui,et al.  Using clustering-based bagging ensemble for credit scoring , 2011, 2011 International Conference on Business Management and Electronic Information.

[11]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[12]  Yu Wang,et al.  Ensemble classification based on supervised clustering for credit scoring , 2016, Appl. Soft Comput..

[13]  Jean Paul Barddal,et al.  On Dynamic Feature Weighting for Feature Drifting Data Streams , 2016, ECML/PKDD.

[14]  Baozong Yuan,et al.  Multiple classifiers combination by clustering and selection , 2001, Inf. Fusion.

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[16]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[17]  Gianluca Antonini,et al.  Subagging for credit scoring models , 2010, Eur. J. Oper. Res..

[18]  Pan Ruo-yu,et al.  Optimization Study on k Value of K-means Algorithm , 2006 .

[19]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[20]  Felipe Maia Galvão França,et al.  Credit analysis with a clustering RAM-based neural classifier , 2014, ESANN.

[21]  ChenFei-Long,et al.  Combination of feature selection approaches with SVM in credit scoring , 2010 .

[22]  Feng-Chia Li,et al.  Combination of feature selection approaches with SVM in credit scoring , 2010, Expert Syst. Appl..

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Hongbo Sun,et al.  Opportunity cost based constraint model for transaction credit evaluation , 2014, Proceedings of the 2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD).