Improving Risk Predictions by Preprocessing Imbalanced Credit Data

Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used.

[1]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[2]  Kenneth Kennedy,et al.  Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem , 2009, AICS.

[3]  H. Sabzevari,et al.  A comparison between statistical and Data Mining methods for credit scoring in case of limited available data , 2007 .

[4]  Yue-Shi Lee,et al.  Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset , 2006 .

[5]  Conor Ryan,et al.  Artificial Intelligence and Cognitive Science , 2002, Lecture Notes in Computer Science.

[6]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[7]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[10]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[11]  Marzia Zaman,et al.  E-business Technology and Strategy - International Conference, CETS 2010, Ottawa, Canada, September 29-30, 2010. Proceedings , 2010, CETS.

[12]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[13]  N. Kiefer Default Estimation for Low-Default Portfolios , 2006 .

[14]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[15]  Jorma Laurikkala,et al.  Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.

[16]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[17]  Hewijin Christine Jiau,et al.  Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem , 2006 .

[18]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[19]  D. Hand,et al.  Scorecard construction with unbalanced class sizes , 2003 .

[20]  Xinzhu Yang,et al.  Solving Credit Scoring Problem with Ensemble Learning: A Case Study , 2009, 2009 Second International Symposium on Knowledge Acquisition and Modeling.

[21]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[22]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[23]  Ping Yao Comparative Study on Class Imbalance Learning for Credit Scoring , 2009, 2009 Ninth International Conference on Hybrid Intelligent Systems.

[24]  George W. Irwin,et al.  Intelligent Control and Automation , 2006 .

[25]  Lei Yang,et al.  Customer Credit Scoring Method Based on the SVDD Classification Model with Imbalanced Dataset , 2010, CETS.