On the use of data filtering techniques for credit risk prediction with instance-based models

Many techniques have been proposed for credit risk prediction, from statistical models to artificial intelligence methods. However, very few research efforts have been devoted to deal with the presence of noise and outliers in the training set, which may strongly affect the performance of the prediction model. Accordingly, the aim of the present paper is to systematically investigate whether the application of filtering algorithms leads to an increase in accuracy of instance-based classifiers in the context of credit risk assessment. The experimental results with 20 different algorithms and 8 credit databases show that the filtered sets perform significantly better than the non-preprocessed training sets when using the nearest neighbour decision rule. The experiments also allow to identify which techniques are most robust and accurate when confronted with noisy credit data.

[1]  Saso Dzeroski,et al.  Noise detection and elimination in data preprocessing: Experiments in medical domains , 2000, Appl. Artif. Intell..

[2]  Huseyin Ince,et al.  A comparison of data mining techniques for credit scoring in banking: A managerial perspective , 2009 .

[3]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[4]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[5]  Taghi M. Khoshgoftaar,et al.  Improving Software Quality Prediction by Noise Filtering Techniques , 2007, Journal of Computer Science and Technology.

[6]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[7]  Kazuo Hattori,et al.  A new edited k-nearest neighbor rule in the pattern classification problem , 2000, Pattern Recognit..

[8]  Chong Sun Hong,et al.  Optimal Threshold from ROC and CAP Curves , 2009, Commun. Stat. Simul. Comput..

[9]  A. Lo,et al.  Consumer Credit Risk Models Via Machine-Learning Algorithms , 2010 .

[10]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[11]  Filiberto Pla,et al.  A Stochastic Approach to Wilson's Editing Algorithm , 2005, IbPRIA.

[12]  Y. Liu,et al.  Data mining feature selection for credit scoring models , 2005, J. Oper. Res. Soc..

[13]  Feng-Chia Li,et al.  Combination of feature selection approaches with SVM in credit scoring , 2010, Expert Syst. Appl..

[14]  Tony R. Martinez,et al.  A noise filtering method using neural networks , 2003, IEEE International Workshop on Soft Computing Techniques in Instrumentation, Measurement and Related Applications, 2003. SCIMA 2003..

[15]  Nada Lavrac,et al.  Experiments with Noise Filtering in a Medical Domain , 1999, ICML.

[16]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[17]  H. Sabzevari,et al.  A comparison between statistical and Data Mining methods for credit scoring in case of limited available data , 2007 .

[18]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[19]  Selwyn Piramuthu Feature Selection for Financial Credit-Risk Evaluation Decisions , 1999, INFORMS J. Comput..

[20]  Yue Wang,et al.  Measuring Scorecard Performance , 2004, International Conference on Computational Science.

[21]  Chih-Fong Tsai,et al.  Simple instance selection for bankruptcy prediction , 2012, Knowl. Based Syst..

[22]  Filiberto Pla,et al.  Prototype selection for the nearest neighbour rule through proximity graphs , 1997, Pattern Recognit. Lett..

[23]  Adnan Khashman,et al.  Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes , 2010, Expert Syst. Appl..

[24]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[25]  Narendra S. Chaudhari,et al.  Selecting useful features for personal credit risk analysis , 2010, Int. J. Bus. Inf. Syst..

[26]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[28]  Jui-Sheng Chou,et al.  Data pre-processing by genetic algorithms for bankruptcy prediction , 2011, 2011 IEEE International Conference on Industrial Engineering and Engineering Management.

[29]  Jian Ma,et al.  Rough set and scatter search metaheuristic based feature selection for credit scoring , 2012, Expert Syst. Appl..

[30]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[31]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[32]  D. Wozabal,et al.  A Coupled Markov Chain Approach to Credit Risk Modeling , 2009, 0911.3802.

[33]  W. Pietruszkiewicz,et al.  Dynamical systems and nonlinear Kalman filtering applied in classification , 2008, 2008 7th IEEE International Conference on Cybernetic Intelligent Systems.

[34]  D. J. Hand,et al.  Good practice in retail credit scorecard assessment , 2005, J. Oper. Res. Soc..

[35]  C. Brodley Recursive Automatic Bias Selection for Classifier Construction , 2004, Machine Learning.

[36]  Gary L. Gastineau The Essentials of Financial Risk Management , 1993 .

[37]  Pierre A. Devijver On the editing rate of the Multiedit algorithm , 1986, Pattern Recognit. Lett..

[38]  Miguel Toro,et al.  Finding representative patterns with ordered projections , 2003, Pattern Recognit..

[39]  Hussein A. Abdou,et al.  Credit Scoring, Statistical Techniques and Evaluation Criteria: A Review of the Literature , 2011, Intell. Syst. Account. Finance Manag..

[40]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[41]  Roberto Alejo,et al.  Analysis of new techniques to obtain quality training sets , 2003, Pattern Recognit. Lett..

[42]  Sotiris Kotsiantis,et al.  On Implementing a Financial Decision Support System , 2006 .

[43]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[44]  Edward I. Altman,et al.  Managing Credit Risk: The Great Challenge for the Global Financial Markets , 2008 .

[45]  Anneleen Van Assche,et al.  Ensemble Methods for Noise Elimination in Classification Problems , 2003, Multiple Classifier Systems.

[46]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..