论文信息 - Shallow Self-Learning for Reject Inference in Credit Scoring

Shallow Self-Learning for Reject Inference in Credit Scoring

Credit scoring models support loan approval decisions in the financial services industry. Lenders train these models on data from previously granted credit applications, where the borrowers' repayment behavior has been observed. This approach creates sample bias. The scoring model (i.e., classifier) is trained on accepted cases only. Applying the resulting model to screen credit applications from the population of all borrowers degrades model performance. Reject inference comprises techniques to overcome sampling bias through assigning labels to rejected cases. The paper makes two contributions. First, we propose a self-learning framework for reject inference. The framework is geared toward real-world credit scoring requirements through considering distinct training regimes for iterative labeling and model training. Second, we introduce a new measure to assess the effectiveness of reject inference strategies. Our measure leverages domain knowledge to avoid artificial labeling of rejected cases during strategy evaluation. We demonstrate this approach to offer a robust and operational assessment of reject inference strategies. Experiments on a real-world credit scoring data set confirm the superiority of the adjusted self-learning framework over regular self-learning and previous reject inference strategies. We also find strong evidence in favor of the proposed evaluation measure assessing reject inference strategies more reliably, raising the performance of the eventual credit scoring model.

Stefan Lessmann | Konstantinos Papakonstantinou | Luís Moreira-Matias | Nikita Kozodoi | Panagiotis Katsas

[1] Zhi-Hua Zhou,et al. Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2] Thomas B. Astebro,et al. The Economic Value of Reject Inference in Credit Scoring , 2001 .

[3] A. Feelders. Credit scoring and reject inference with mixture models , 2000 .

[4] Ralf Stecking,et al. Using Multiple SVM Models for Unbalanced Credit Scoring Data Sets , 2007, GfKl.

[5] Bart Baesens,et al. Development and application of consumer credit scoring models using profit-based classification measures , 2014, Eur. J. Oper. Res..

[6] Rich Caruana,et al. Obtaining Calibrated Probabilities from Boosting , 2005, UAI.

[7] Walter Krämer,et al. Reject inference in consumer credit scoring with nonignorable missing data , 2013 .

[8] Saharon Rosset,et al. Model selection via the AUC , 2004, ICML.

[9] D. Fogarty. Multiple Imputation as a Missing Data Approach to Reject Inference on Consumer Credit Scoring , 2006 .

[10] Di Wang,et al. A hybrid system with filter approach and multiple population genetic algorithm for feature selection in credit scoring , 2018, J. Comput. Appl. Math..

[11] Sebastián Maldonado,et al. A Semi-supervised Approach for Reject Inference in Credit Scoring Using SVMs , 2010, ICDM.

[12] Y. Kim,et al. Technology scoring model considering rejected applicants and effect of reject inference , 2007, J. Oper. Res. Soc..

[13] David J. Hand,et al. Can reject inference ever work , 1993 .

[14] Derrick N. Joanes,et al. Reject inference applied to logistic regression for credit scoring , 1993 .

[15] Naeem Siddiqi,et al. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring , 2005 .

[16] J. Suykens,et al. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[17] Dirk Van den Poel,et al. The impact of sample bias on consumer credit scoring performance and profitability , 2005, J. Oper. Res. Soc..

[18] Jonathan Crook,et al. Does reject inference really improve the performance of application scoring models , 2004 .

[19] Bernhard Schölkopf,et al. Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[20] Jian Ma,et al. Empirical Evaluation of Ensemble Learning for Credit Scoring , 2010 .

[21] Jonathan Crook,et al. Sample selection bias in credit scoring models , 2003, J. Oper. Res. Soc..

[22] John Banasik,et al. Credit scoring, augmentation and lean models , 2005, J. Oper. Res. Soc..

[23] Wei Yang,et al. Reject inference in credit scoring using Semi-supervised Support Vector Machines , 2017, Expert Syst. Appl..

[24] Jonathan Crook,et al. Reject inference, augmentation, and sample selection , 2007, Eur. J. Oper. Res..

[25] Anneleen Van Assche,et al. Ensemble Methods for Noise Elimination in Classification Problems , 2003, Multiple Classifier Systems.

[26] Jonathan Crook,et al. Reject inference in survival analysis by augmentation , 2010, J. Oper. Res. Soc..

[27] Billie Anderson,et al. Modified logistic regression using the EM algorithm for reject inference , 2013, Int. J. Data Anal. Tech. Strateg..

[28] Thomas Åstebro,et al. Bound and collapse Bayesian reject inference for credit scoring , 2010, J. Oper. Res. Soc..

[29] Alan K. Reichert,et al. An Examination of the Conceptual Issues Involved in Developing Credit-Scoring Models , 1983 .

[30] Francisco Herrera,et al. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[31] Charles Elkan,et al. A Bayesian network framework for reject inference , 2004, KDD.

[32] Martial Hebert,et al. Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.