A novel dynamic ensemble selection classifier for an imbalanced data set: An application for credit risk assessment

Abstract Credit risk assessment is usually regarded as an imbalanced classification task solved by static ensemble classifiers. However, the dynamic ensemble selection (DES) strategy that can select different ensemble classifiers for each query sample is rarely used. Deficiency of the existing DES algorithm in dealing with imbalanced data is the major challenge. In this paper, a novel combined DES model is developed for imbalanced learning problems. To handle the imbalanced data sets, the synthetic minority over-sampling technique is initially used to balance a training set before generating a candidate classifier pool; then, the weighting mechanism of DES-MI (multi-class imbalance) is used to highlight the importance of minority instances when evaluating classifier competences. To further ensure the comprehensive evaluation and right selection of the ensemble classifier, the meta-learning framework of META-DES is used to account for multiple criteria, and the two-step selection strategy of DES-KNN (k-nearest neighbours) is employed to perform a trade-off between the competence and diversity of the classifiers. Our experiments on 15 imbalanced data sets from the KEEL repository show that the proposed model improves the performance of seven known and popular DES algorithms in terms of the area under the curve. Moreover, the type I error rate of the proposed method is lower than that of XGBoost and LightGBM in a real P2P loan data set indicating the efficiency of the proposed method for credit risk assessment.

[1]  Yufei Xia,et al.  A novel heterogeneous ensemble credit scoring model based on bstacking approach , 2018, Expert Syst. Appl..

[2]  Jing Qiu,et al.  Dynamic ensemble classification for credit scoring using soft probability , 2018, Appl. Soft Comput..

[3]  Robert Sabourin,et al.  Dynamic selection approaches for multiple classifier systems , 2011, Neural Computing and Applications.

[4]  György Kovács,et al.  An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets , 2019, Appl. Soft Comput..

[5]  Shangkun Deng,et al.  A gradient boosting decision tree approach for insider trading identification: An empirical model evaluation of China stock market , 2019, Appl. Soft Comput..

[6]  Sungzoon Cho,et al.  Multi-class classification via heterogeneous ensemble of one-class classifiers , 2015, Eng. Appl. Artif. Intell..

[7]  Yufei Xia,et al.  A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring , 2017, Expert Syst. Appl..

[8]  Xiaojun Ma,et al.  Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning , 2018, Electron. Commer. Res. Appl..

[9]  Alexandru Coser,et al.  Predictive Models for Loan Default Risk Assessment , 2019, ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH.

[10]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[11]  Liu Xiao,et al.  BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification , 2016 .

[12]  Zhiyong Li,et al.  A novel ensemble classification model based on neural networks and a classifier optimisation technique for imbalanced credit risk evaluation , 2019, Physica A: Statistical Mechanics and its Applications.

[13]  George D. C. Cavalcanti,et al.  META-DES: A dynamic ensemble selection framework using meta-learning , 2015, Pattern Recognit..

[14]  Chongsheng Zhang,et al.  An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme , 2018, Knowl. Based Syst..

[15]  George D. C. Cavalcanti,et al.  Online pruning of base classifiers for Dynamic Ensemble Selection , 2017, Pattern Recognit..

[16]  Basilio Sierra,et al.  K Nearest Neighbor Equality: Giving equal chance to all existing classes , 2011, Inf. Sci..

[17]  Hamido Fujita,et al.  Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates , 2018, Inf. Sci..

[18]  Jian Ma,et al.  A hybrid ensemble approach for enterprise credit risk assessment based on Support Vector Machine , 2012, Expert Syst. Appl..

[19]  Yafei Zhang,et al.  Dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation , 2010, Knowl. Based Syst..

[20]  Daniel Westreich,et al.  Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. , 2010, Journal of clinical epidemiology.

[21]  George D. C. Cavalcanti,et al.  META-DES.H: A Dynamic Ensemble Selection technique using meta-learning and a dynamic weighting approach , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[22]  Jing Li,et al.  A distance-based weighting framework for boosting the performance of dynamic ensemble selection , 2019, Inf. Process. Manag..

[23]  Maysam F. Abbod,et al.  Classifiers consensus system approach for credit scoring , 2016, Knowl. Based Syst..

[24]  Ling Tang,et al.  A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data , 2018, Appl. Soft Comput..

[25]  Hui Li,et al.  Empirical research of hybridizing principal component analysis with multivariate discriminant analysis and logistic regression for business failure prediction , 2011, Expert Syst. Appl..

[26]  Rafael Pino-Mejías,et al.  Credit scoring models for the microfinance industry using neural networks: Evidence from Peru , 2013, Expert Syst. Appl..

[27]  José Ramón Quevedo,et al.  Dynamic ensemble selection for quantification tasks , 2019, Inf. Fusion.

[28]  Anne M. P. Canuto,et al.  Investigating the impact of selection criteria in dynamic ensemble selection methods , 2018, Expert Syst. Appl..

[29]  Hong-yu Zhang,et al.  A hybrid PSO-SVM model based on clustering algorithm for short-term atmospheric pollutant concentration forecasting , 2019, Technological Forecasting and Social Change.

[30]  Robert Sabourin,et al.  LoGID: An adaptive framework combining local and global incremental learning for dynamic selection of ensembles of HMMs , 2012, Pattern Recognit..

[31]  Nelmarie Louw,et al.  Variable selection in kernel Fisher discriminant analysis by means of recursive feature elimination , 2006, Comput. Stat. Data Anal..

[32]  Shasha Wang,et al.  Structure extended multinomial naive Bayes , 2016, Inf. Sci..

[33]  Arjana Brezigar-Masten,et al.  CART-based selection of bankruptcy predictors for the logit model , 2012, Expert Syst. Appl..

[34]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[35]  Robert Sabourin,et al.  From dynamic classifier selection to dynamic ensemble selection , 2008, Pattern Recognit..

[36]  Tri Dev Acharya,et al.  Landslide susceptibility mapping using J48 Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China) , 2018 .

[37]  Francisco Herrera,et al.  Dynamic ensemble selection for multi-class imbalanced datasets , 2018, Inf. Sci..

[38]  Yung-Chia Chang,et al.  Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions , 2018, Appl. Soft Comput..

[39]  Lingxiao Tang,et al.  Applying a nonparametric random forest algorithm to assess the credit risk of the energy industry in China , 2019, Technological Forecasting and Social Change.

[40]  George D. C. Cavalcanti,et al.  FIRE-DES++: Enhanced Online Pruning of Base Classifiers for Dynamic Ensemble Selection , 2018, Pattern Recognit..

[41]  Yu Wang,et al.  Ensemble classification based on supervised clustering for credit scoring , 2016, Appl. Soft Comput..

[42]  Francisco Herrera,et al.  On the usefulness of one-class classifier ensembles for decomposition of multi-class problems , 2015, Pattern Recognit..

[43]  Zhen Liu,et al.  A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data , 2017, Neurocomputing.

[44]  George D. C. Cavalcanti,et al.  Dynamic classifier selection: Recent advances and perspectives , 2018, Inf. Fusion.

[45]  Marek Kurzynski,et al.  A measure of competence based on random classification for dynamic ensemble selection , 2012, Inf. Fusion.

[46]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[47]  Chi Xie,et al.  Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance , 2017, Neural Computing and Applications.

[48]  Ning Chen,et al.  Financial credit risk assessment: a recent review , 2015, Artificial Intelligence Review.

[49]  Francisco Herrera,et al.  Exploring the effectiveness of dynamic ensemble selection in the one-versus-one scheme , 2017, Knowl. Based Syst..

[50]  Nojun Kwak,et al.  Feature extraction for classification problems and its application to face recognition , 2008, Pattern Recognit..