Combining B&B-based hybrid feature selection and the imbalance-oriented multiple-classifier ensemble for imbalanced credit risk assessment

AbstractAn ideal model for credit risk assessment is supposed to select important features and process imbalanced data sets in an effective manner. This paper proposes an integrated method that combines B&B (branch and bound)-based hybrid feature selection (BBHFS) with the imbalanceoriented multiple-classifier ensemble (IOMCE) for imbalanced credit risk assessment and uses the support vector machine (SVM) and the multiple discriminant analysis (MDA) as the base predictor. BBHFS is a hybrid feature selection method that integrates the t-test and B&B with the k-fold crossvalidation method to search for a satisfactory feature subset. The IOMCE divides majority samples into several subsets and then combines them with minority samples to construct several training sets for constructing a multiple-classifier ensemble model. We conduct main experiments using a 1:3 imbalanced corporate credit risk data set with continuous features and extended experiments using a 1:5 imbalanced data set with continuous features a...

[1]  Chen Ying,et al.  The comparison of enterprise bankruptcy forecasting method , 2011 .

[2]  M. Chijoriga,et al.  Application of multiple discriminant analysis (MDA) as a credit scoring and risk assessment model , 2011 .

[3]  Jianping Li,et al.  A weighted Lq adaptive least squares support vector machine classifiers - Robust and sparse approximation , 2011, Expert Syst. Appl..

[4]  Kin Keung Lai,et al.  Credit scorecard based on logistic regression with random coefficients , 2010, ICCS.

[5]  Rūta Adlytė,et al.  New internal rating approach for credit risk assessment , 2011 .

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  Kin Keung Lai,et al.  An intelligent-agent-based fuzzy group decision making model for financial multicriteria decision support: The case of credit scoring , 2009, Eur. J. Oper. Res..

[8]  Anjan V. Thakor,et al.  Collateral and Rationing: Sorting Equilibria in Monopolistic and Competitive Credit Markets , 1987 .

[9]  Xia Han The Model of Credit Risk Assessment in Commercial Banks on Fuzzy Integral Support Vector Machines Ensemble , 2009 .

[10]  Rong Yan,et al.  On predicting rare classes with SVM ensembles in scene classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Robert A. Eisenbeis,et al.  PITFALLS IN THE APPLICATION OF DISCRIMINANT ANALYSIS IN BUSINESS, FINANCE, AND ECONOMICS , 1977 .

[12]  E. Laitinen Predicting a corporate credit analyst's risk estimate by logistic and linear models , 1999 .

[13]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[14]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[15]  Yong Shi,et al.  Credit risk evaluation by using nearest subspace method , 2010, ICCS.

[16]  Soushan Wu,et al.  Credit rating analysis with support vector machines and neural networks: a market comparative study , 2004, Decis. Support Syst..

[17]  J. Stiglitz,et al.  Credit Rationing in Markets with Imperfect Information , 1981 .

[18]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[19]  Jonathan N. Crook,et al.  Recent developments in consumer credit risk assessment , 2007, Eur. J. Oper. Res..

[20]  Kin Keung Lai,et al.  Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection , 2011, Expert Syst. Appl..

[21]  Smaranda Stoenescu Cimpoeru,et al.  Neural networks and their application in credit risk assessment. Evidence from the Romanian market , 2011 .

[22]  Wen Yi-min,et al.  A survey of imbalanced pattern classification problems , 2009 .

[23]  Jian Ma,et al.  A comparative assessment of ensemble learning for credit scoring , 2011, Expert Syst. Appl..

[24]  Aristidis Likas,et al.  Semi-supervised and active learning with the probabilistic RBF classifier , 2008, Neurocomputing.

[25]  James J. Chen,et al.  Classification by ensembles from random partitions of high-dimensional data , 2007, Comput. Stat. Data Anal..

[26]  Gianluca Antonini,et al.  Subagging for credit scoring models , 2010, Eur. J. Oper. Res..

[27]  Helmut Bester,et al.  Screening vs. Rationing in Credit Markets with Imperfect Information , 1985 .

[28]  Po-Cheng Chen,et al.  An enforced support vector machine model for construction contractor default prediction , 2011 .

[29]  S. M. Finlay Towards profitability: a utility approach to the credit scoring problem , 2008, J. Oper. Res. Soc..

[30]  T Bellotti,et al.  Credit scoring with macroeconomic variables using survival analysis , 2009, J. Oper. Res. Soc..

[31]  Kenneth Kennedy,et al.  Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem , 2009, AICS.

[32]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[33]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[34]  E. L. Lawler,et al.  Branch-and-Bound Methods: A Survey , 1966, Oper. Res..

[35]  Hui Li,et al.  Dynamic financial distress prediction using instance selection for the disposal of concept drift , 2011, Expert Syst. Appl..

[36]  Stephen D. Williamson Costly monitoring, financial intermediation, and equilibrium credit rationing , 1986 .

[37]  Bart Baesens,et al.  Credit rating prediction using Ant Colony Optimization , 2010, J. Oper. Res. Soc..

[38]  Raquel Florez-Lopez,et al.  Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data , 2010 .

[39]  Arijit Laha Building contextual classifiers by integrating fuzzy rule based classification technique and k-nn method for credit scoring , 2007, Adv. Eng. Informatics.

[40]  Ying Wah Teh,et al.  Credit Scoring Models Using Soft Computing Methods: A Survey , 2010, Int. Arab J. Inf. Technol..

[41]  M. Ibrahímo,et al.  Asymmetric Information and Models of Credit Rationing , 1993 .

[42]  Lin Ma,et al.  Mining the customer credit using hybrid support vector machine technique , 2009, Expert Syst. Appl..

[43]  Chris Stewart,et al.  A note comparing support vector machines and ordered choice models' predictions of international banks' ratings , 2011, Decis. Support Syst..

[44]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[45]  Kin Keung Lai,et al.  Credit risk assessment with a multistage neural network ensemble learning approach , 2008, Expert Syst. Appl..

[46]  Wuyi Yue,et al.  Support vector machine based multiagent ensemble learning for credit risk evaluation , 2010, Expert Syst. Appl..

[47]  C. Spearman General intelligence Objectively Determined and Measured , 1904 .

[48]  Xue-wen Chen An improved branch and bound algorithm for feature selection , 2003, Pattern Recognit. Lett..

[49]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[50]  Kin Keung Lai,et al.  Least squares support vector machines ensemble models for credit scoring , 2010, Expert Syst. Appl..

[51]  Shian-Chang Huang,et al.  Integrating nonlinear graph based dimensionality reduction schemes with SVMs for credit rating forecasting , 2009, Expert Syst. Appl..

[52]  Andrea Roli,et al.  A neural network approach for credit risk evaluation , 2008 .

[53]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[54]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[55]  Xiang Yu,et al.  Financial distress prediction based on SVM and MDA methods: the case of Chinese listed companies , 2011 .

[56]  Chih-Fong Tsai,et al.  Using neural network ensembles for bankruptcy prediction and credit scoring , 2008, Expert Syst. Appl..

[57]  José Salvador Sánchez,et al.  On the effectiveness of preprocessing methods when dealing with different levels of class imbalance , 2012, Knowl. Based Syst..

[58]  Bart Baesens,et al.  Using Neural Network Rule Extraction and Decision Tables for Credit - Risk Evaluation , 2003, Manag. Sci..

[59]  Jie Sun,et al.  An Application of Support Vector Machine to Companies' Financial Distress Prediction , 2006, MDAI.

[60]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[61]  Angie Wade,et al.  When t-tests or Wilcoxon-Mann-Whitney tests won't do. , 2010, Advances in physiology education.

[62]  Adnan Khashman,et al.  Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes , 2010, Expert Syst. Appl..

[63]  David West,et al.  Neural network ensemble strategies for financial decision applications , 2005, Comput. Oper. Res..

[64]  Chih-Fong Tsai,et al.  Feature selection in bankruptcy prediction , 2009, Knowl. Based Syst..

[65]  Jonathan Crook,et al.  Support vector machines for credit scoring and discovery of significant features , 2009, Expert Syst. Appl..