Using semi-supervised classifiers for credit scoring

In credit scoring, low-default portfolios (LDPs) are those for which very little default history exists. This makes it problematic for financial institutions to estimate a reliable probability of a customer defaulting on a loan. Banking regulation (Basel II Capital Accord), and best practice, however, necessitate an accurate and valid estimate of the probability of default. In this article the suitability of semi-supervised one-class classification (OCC) algorithms as a solution to the LDP problem is evaluated. The performance of OCC algorithms is compared with the performance of supervised two-class classification algorithms. This study also investigates the suitability of over sampling, which is a common approach to dealing with LDPs. Assessment of the performance of one- and two-class classification algorithms using nine real-world banking data sets, which have been modified to replicate LDPs, is provided. Our results demonstrate that only in the near or complete absence of defaulters should semi-supervised OCC algorithms be used instead of supervised two-class classification algorithms. Furthermore, we demonstrate for data sets whose class labels are unevenly distributed that optimising the threshold value on classifier output yields, in many cases, an improvement in classification performance. Finally, our results suggest that oversampling produces no overall improvement to the best performing two-class classification algorithms.

[1]  Kenneth Kennedy,et al.  Using semi-supervised classifiers for credit scoring , 2013, J. Oper. Res. Soc..

[2]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  P. Meehl,et al.  Clinical versus Statistical Prediction. , 1955 .

[5]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .

[6]  Nathalie Japkowicz,et al.  Concept learning in the absence of counterexamples: an autoassociation-based approach to classification , 1999 .

[7]  Ian H. Witten,et al.  Weka machine learning algorithms in java , 2000 .

[8]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[9]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[10]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[11]  Robert A. Eisenbeis,et al.  PITFALLS IN THE APPLICATION OF DISCRIMINANT ANALYSIS IN BUSINESS, FINANCE, AND ECONOMICS , 1977 .

[12]  M. M. Moya,et al.  One-class classifier networks for target recognition applications , 1993 .

[13]  David M. J. Tax,et al.  One-class classification , 2001 .

[14]  Selwyn Piramuthu,et al.  Financial credit-risk evaluation with neural and neurofuzzy systems , 1999, Eur. J. Oper. Res..

[15]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[16]  Ruud H. Koning,et al.  A Practical Approach to Validating a PD Model , 2009 .

[17]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[18]  Nico van der Wijst,et al.  Default probabilities in a corporate bank portfolio: A logistic model approach , 2001, Eur. J. Oper. Res..

[19]  Paul E. Meehl,et al.  Clinical versus Statistical Prediction. , 1955 .

[20]  Vasant Dhar,et al.  Outlier detection special issue , 2009, Data Mining and Knowledge Discovery.

[21]  D. Hand,et al.  Scorecard construction with unbalanced class sizes , 2003 .

[22]  Kathryn Hempstalk,et al.  Continuous Typist Verification using Machine Learning , 2009 .

[23]  Loris Nanni,et al.  An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring , 2009, Expert Syst. Appl..

[24]  Montserrat Guillén,et al.  Count data models for a credit scoring system , 1996 .

[25]  Robert P. W. Duin,et al.  A Matlab Toolbox for Pattern Recognition , 2004 .

[26]  Bart Baesens,et al.  Inferring descriptive and approximate fuzzy rules for credit scoring using evolutionary algorithms , 2007, Eur. J. Oper. Res..

[27]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[28]  Gunter Ritter,et al.  Outliers in statistical pattern recognition and an application to automatic chromosome classification , 1997, Pattern Recognit. Lett..

[29]  D. J. Hand,et al.  Evaluating models for classifying customers in retail banking collections , 2010, J. Oper. Res. Soc..

[30]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[31]  Jonathan Crook,et al.  Credit Scoring Models in the Credit Union Environment Using Neural Networks and Genetic Algorithms , 1997 .

[32]  Dirk Tasche,et al.  Estimating Probabilities of Default for Low Default Portfolios , 2004 .

[33]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[34]  Lyn C. Thomas,et al.  Consumer finance: challenges for operational research , 2010, J. Oper. Res. Soc..

[35]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[36]  Ralf Stecking,et al.  Support vector machines for classifying and describing credit applicants: detecting typical and critical regions , 2005, J. Oper. Res. Soc..

[37]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[38]  Chang-Tien Lu,et al.  Outlier Detection , 2008, Encyclopedia of GIS.

[39]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[40]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[41]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[42]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[43]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[44]  H. Sabzevari,et al.  A comparison between statistical and Data Mining methods for credit scoring in case of limited available data , 2007 .

[45]  G.S. May,et al.  Fault detection in reactive ion etching systems using one-class support vector machines , 2005, IEEE/SEMI Conference and Workshop on Advanced Semiconductor Manufacturing 2005..

[46]  S. Ingolfsson,et al.  Cyclical adjustment of point-in-time PD , 2010, J. Oper. Res. Soc..

[47]  Niall M. Adams,et al.  Off-the-peg and bespoke classifiers for fraud detection , 2008, Comput. Stat. Data Anal..

[48]  Catalina Stefanescu,et al.  The credit rating process and estimation of transition probabilities: A Bayesian approach , 2009 .

[49]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[50]  Taghi M. Khoshgoftaar,et al.  Learning with limited minority class data , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[51]  Bart Baesens,et al.  Credit rating prediction using Ant Colony Optimization , 2010, J. Oper. Res. Soc..

[52]  Geoffrey E. Hinton 20 – CONNECTIONIST LEARNING PROCEDURES1 , 1990 .

[53]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[54]  W. Pietruszkiewicz,et al.  Dynamical systems and nonlinear Kalman filtering applied in classification , 2008, 2008 7th IEEE International Conference on Cybernetic Intelligent Systems.

[55]  Qi Fei,et al.  A comparative study of data mining methods in consumer loans credit scoring management , 2006 .

[56]  Katharina J. Hoff,et al.  BMC Bioinformatics BioMed Central Methodology article Gene prediction in metagenomic fragments: A large scale machine , 2008 .

[57]  Eamonn Keogh Why the lack of reproducibility is crippling research in data mining and what you can do about it , 2007, MDM '07.

[58]  William M. Grove,et al.  Clinical versus Statistical Prediction , 2015 .

[59]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[60]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[61]  D. Hand,et al.  A k-nearest-neighbour classifier for assessing consumer credit risk , 1996 .

[62]  Joshua Alspector,et al.  Data duplication: an imbalance problem ? , 2003 .

[63]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[64]  Hyoungjoo Lee,et al.  Focusing on non-respondents: Response modeling with novelty detectors , 2007, Expert Syst. Appl..

[65]  Robert P. W. Duin,et al.  Data description in subspaces , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[66]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[67]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[68]  Jan Vanthienen,et al.  50 years of data mining and OR: upcoming trends and challenges , 2009, J. Oper. Res. Soc..

[69]  Chih-Fong Tsai,et al.  Using neural network ensembles for bankruptcy prediction and credit scoring , 2008, Expert Syst. Appl..

[70]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[71]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[72]  David J. Hand,et al.  Can reject inference ever work , 1993 .

[73]  Li-Chiu Chi,et al.  Predicting multilateral trade credit risks: comparisons of Logit and Fuzzy Logic models using ROC curve analysis , 2005, Expert Syst. Appl..

[74]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[75]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[76]  T Bellotti,et al.  Credit scoring with macroeconomic variables using survival analysis , 2009, J. Oper. Res. Soc..

[77]  George A. Overstreet,et al.  The flat-maximum effect and generic linear scoring models: a test , 1992 .

[78]  M. Krivko,et al.  A hybrid model for plastic card fraud detection systems , 2010, Expert Syst. Appl..

[79]  L. C. Thomas,et al.  Operations research in consumer finance: challenges for operational research , 2009 .