Using Sample Selection to Improve Accuracy and Simplicity of Rules Extracted from Neural Networks for Credit Scoring Applications

In this paper, we present an approach for sample selection using an ensemble of neural networks for credit scoring. The ensemble determines samples that can be considered outliers by checking the classification accuracy of the neural networks on the original training data samples. Those samples that are consistently misclassified by the neural networks in the ensemble are removed from the training dataset. The remaining data samples are then used to train and prune another neural network for rule extraction. Our experimental results on publicly available benchmark credit scoring datasets show that by eliminating the outliers, we obtain neural networks with higher predictive accuracy and simpler in structure compared to the networks that are trained with the original dataset. A rule extraction algorithm is applied to generate comprehensible rules from the neural networks. The extracted rules are more concise than the rules generated from networks that have been trained using the original datasets.

[1]  José Salvador Sánchez,et al.  On the use of data filtering techniques for credit risk prediction with instance-based models , 2012, Expert Syst. Appl..

[2]  Steven Finlay,et al.  Multiple classifier architectures and their application to credit risk assessment , 2011, Eur. J. Oper. Res..

[3]  Xiang Hui,et al.  Credit scoring model based on selective neural network ensemble , 2011, ICNC.

[4]  Bart Baesens,et al.  Recursive Neural Network Rule Extraction for Data With Mixed Attributes , 2008, IEEE Transactions on Neural Networks.

[5]  Kin Keung Lai,et al.  An intelligent-agent-based fuzzy group decision making model for financial multicriteria decision support: The case of credit scoring , 2009, Eur. J. Oper. Res..

[6]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[7]  Roberto Alejo,et al.  Analysis of new techniques to obtain quality training sets , 2003, Pattern Recognit. Lett..

[8]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[9]  Maria Stepanova,et al.  Survival Analysis Methods for Personal Loan Data , 2002, Oper. Res..

[10]  Bart Baesens,et al.  Rule Extraction from Minimal Neural Networks for Credit Card Screening , 2011, Int. J. Neural Syst..

[11]  Stephen C. H. Leung,et al.  Vertical bagging decision trees model for credit scoring , 2010, Expert Syst. Appl..

[12]  Nebojsa Nikolic,et al.  The application of brute force logistic regression to corporate credit scoring models: Evidence from Serbian financial statements , 2013, Expert Syst. Appl..

[13]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[14]  Brijesh Verma,et al.  Relationship between Data Size, accuracy, Diversity and Clusters in Neural Network Ensembles , 2013, Int. J. Comput. Intell. Appl..

[15]  Adnan Khashman,et al.  Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes , 2010, Expert Syst. Appl..

[16]  David West,et al.  Neural network ensemble strategies for financial decision applications , 2005, Comput. Oper. Res..

[17]  Hussein A. Abdou,et al.  Neural nets versus conventional techniques in credit scoring in Egyptian banking , 2008, Expert Syst. Appl..

[18]  Ramayya Krishnan,et al.  Predicting repayment of the credit card debt , 2012, Comput. Oper. Res..

[19]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[20]  M. Boukadoum,et al.  An Ensemble System Based on Hybrid EGARCH-ANN with Different Distributional Assumptions to Predict S&P 500 Intraday Volatility , 2015 .

[21]  Jian Ma,et al.  Two credit scoring models based on dual strategy ensemble trees , 2012, Knowl. Based Syst..

[22]  John A. Bullinaria,et al.  Neural network ensembles for time series forecasting , 2009, GECCO '09.

[23]  Chih-Fong Tsai,et al.  Using neural network ensembles for bankruptcy prediction and credit scoring , 2008, Expert Syst. Appl..

[24]  Siddhartha Bhattacharyya,et al.  Data mining for credit card fraud: A comparative study , 2011, Decis. Support Syst..

[25]  Mohammad Ali Bagheri,et al.  Forecasting crude oil price with ensemble neural networks based on different feature subsets method , 2015 .

[26]  Ming-Fu Hsu,et al.  Credit risk assessment and decision making by a fusion approach , 2012, Knowl. Based Syst..

[27]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[28]  Jian Ma,et al.  Study of corporate credit risk prediction based on integrating boosting and random subspace , 2011, Expert Syst. Appl..

[29]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[30]  Kyoung-jae Kim,et al.  A corporate credit rating model using multi-class support vector machines with an ordinal pairwise partitioning approach , 2012, Comput. Oper. Res..

[31]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[32]  Germano C. Vasconcelos,et al.  MLP ensembles improve long term prediction accuracy over single networks , 2011 .

[33]  Rudy Setiono,et al.  A note on knowledge discovery using neural networks and its application to credit card screening , 2009, Eur. J. Oper. Res..

[34]  K. Lai,et al.  Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm , 2008 .

[35]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[36]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[37]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[38]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Francisco Louzada,et al.  On the impact of disproportional samples in credit scoring models: An application to a Brazilian bank data , 2012, Expert Syst. Appl..