Credit Rating Based on Hybrid Sampling and Dynamic Ensemble

The core problem of the credit rating is how to build an efficient and accurate classifier on the imbalanced datasets. The ensemble learning and resampling technology have rich results in this field, but the efficiency of the classifier is limited when dealing with high imbalanced credit data. In this paper, we propose a credit rating model based on hybrid sampling and dynamic ensemble technique. Hybrid sampling can contribute to build a rich base classifier pool and improve the accuracy of the integrated learning model. The combination of hybrid sampling and dynamic ensemble can apply to various imbalanced data and obtain better classification results. In the resampling phase, synthetic minority over-sampling technique (SMOTE) and boundary-sensitive under-sampling techniques are used to process the training data set, and the clustering technique is used to improve the under-sampling and make it more adaptable to high imbalanced credit data, by generating more samples and more representative training subset to enhance the diversity of the basic classifier. A dynamic selection method is used to select one or more classifiers from the basic classifier pool for each test sample. Experiments on three credit data sets prove that the combination of hybrid sampling and dynamic ensemble can effectively improve the performance of the classification.

[1]  D. Hand,et al.  A k-nearest-neighbour classifier for assessing consumer credit risk , 1996 .

[2]  Suresh N. Mali,et al.  A Hybrid Approach for Class Imbalance Problem in Customer Churn Prediction: A Novel Extension to Under-sampling , 2018 .

[3]  Yong Shi,et al.  Credit card churn forecasting by logistic regression and decision tree , 2011, Expert Syst. Appl..

[4]  José Antônio Fernandes de Macêdo,et al.  A novel approach to define the local region of dynamic selection techniques in imbalanced credit scoring problems , 2020, Expert Syst. Appl..

[5]  Jing Qiu,et al.  Dynamic ensemble classification for credit scoring using soft probability , 2018, Appl. Soft Comput..

[6]  Sabyasachi Patra,et al.  Sparse Maximum Margin Logistic Regression for Credit Scoring , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  George D. C. Cavalcanti,et al.  META-DES: A dynamic ensemble selection framework using meta-learning , 2015, Pattern Recognit..

[8]  Alan Prahutama,et al.  Credit scoring analysis using weighted k nearest neighbor , 2018 .

[9]  Yufei Xia,et al.  A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring , 2017, Expert Syst. Appl..

[10]  Xin Yan,et al.  Unsupervised quadratic surface support vector machine with application to credit risk assessment , 2020, Eur. J. Oper. Res..

[11]  John L. Adrian,et al.  A linear programming alternative to discriminant analysis in credit scoring , 1985 .

[12]  Yung-Chia Chang,et al.  Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions , 2018, Appl. Soft Comput..

[13]  Ji Won Kim,et al.  Decision tree-based technology credit scoring for start-up firms: Korean case , 2012, Expert Syst. Appl..

[14]  Byeong Ho Kang,et al.  Investigation and improvement of multi-layer perception neural networks for credit scoring , 2015, Expert Syst. Appl..

[15]  So Young Sohn,et al.  Technology credit scoring model with fuzzy logistic regression , 2016, Appl. Soft Comput..

[16]  Richard Weber,et al.  Credit scoring using three-way decisions with probabilistic rough sets , 2020, Inf. Sci..

[17]  Robert A. Eisenbeis,et al.  Problems in applying discriminant analysis in credit scoring models , 1978 .

[18]  Andrea Capotorti,et al.  Credit scoring analysis using a fuzzy probabilistic rough set model , 2012, Comput. Stat. Data Anal..

[19]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Hamido Fujita,et al.  Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates , 2018, Inf. Sci..

[21]  Wei Yang,et al.  Reject inference in credit scoring using Semi-supervised Support Vector Machines , 2017, Expert Syst. Appl..

[22]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[23]  Wenyu Zhang,et al.  Classifier selection and clustering with fuzzy assignment in ensemble model for credit scoring , 2018, Neurocomputing.

[24]  M. Neamtu,et al.  Discriminant analysis in a credit scoring model , 2011 .

[25]  Nojun Kwak,et al.  Feature extraction for classification problems and its application to face recognition , 2008, Pattern Recognit..

[26]  George D. C. Cavalcanti,et al.  A DEEP analysis of the META-DES framework for dynamic selection of ensemble of classifiers , 2015, ArXiv.

[27]  Robert Sabourin,et al.  From dynamic classifier selection to dynamic ensemble selection , 2008, Pattern Recognit..

[28]  Amar Mitiche,et al.  Classifier combination for hand-printed digit recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[29]  Pankaj Deep Kaur,et al.  A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment , 2020, Appl. Soft Comput..

[30]  R. Vedala,et al.  An application of Naive Bayes classification for credit scoring in e-lending platform , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).

[31]  Michael Kampffmeyer,et al.  Deep generative models for reject inference in credit scoring , 2019, Knowl. Based Syst..