A data driven ensemble classifier for credit scoring analysis

This study focuses on predicting whether a credit applicant can be categorized as good, bad or borderline from information initially supplied. This is essentially a classification task for credit scoring. Given its importance, many researchers have recently worked on an ensemble of classifiers. However, to the best of our knowledge, unrepresentative samples drastically reduce the accuracy of the deployment classifier. Few have attempted to preprocess the input samples into more homogeneous cluster groups and then fit the ensemble classifier accordingly. For this reason, we introduce the concept of class-wise classification as a preprocessing step in order to obtain an efficient ensemble classifier. This strategy would work better than a direct ensemble of classifiers without the preprocessing step. The proposed ensemble classifier is constructed by incorporating several data mining techniques, mainly involving optimal associate binning to discretize continuous values; neural network, support vector machine, and Bayesian network are used to augment the ensemble classifier. In particular, the Markov blanket concept of Bayesian network allows for a natural form of feature selection, which provides a basis for mining association rules. The learned knowledge is represented in multiple forms, including causal diagram and constrained association rules. The data driven nature of the proposed system distinguishes it from existing hybrid/ensemble credit scoring systems.

[1]  Szymon Jaroszewicz,et al.  Interestingness of frequent itemsets using Bayesian networks as background knowledge , 2004, KDD.

[2]  Bart Baesens,et al.  Faculteit Economie En Bedrijfskunde Hoveniersberg 24 B-9000 Gent Bayesian Network Classifiers for Identifying the Slope of the Customer Lifecycle of Long-life Customers Bayesian Network Classifiers for Identifying the Slope of the Customer Lifecycle of Long-life Customers , 2022 .

[3]  Yu-Bin Yang,et al.  Lung cancer cell identification based on artificial neural network ensembles , 2002, Artif. Intell. Medicine.

[4]  Melody Y. Kiang,et al.  Managerial Applications of Neural Networks: The Case of Bank Failure Predictions , 1992 .

[5]  Girish N. Punj,et al.  Cluster Analysis in Marketing Research: Review and Suggestions for Application , 1983 .

[6]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[7]  L. Thomas A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers , 2000 .

[8]  Kagan Tumer,et al.  Classifier ensembles: Select real-world applications , 2008, Inf. Fusion.

[9]  A. J. Feelders,et al.  MAMBO: Discovering Association Rules Based on Conditional Independencies , 2001, IDA.

[10]  Robert Susmaga,et al.  Analyzing Discretizations of Continuous Attributes Given a Monotonic Discrimination Function , 1997, Intell. Data Anal..

[11]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[12]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[13]  E. Mine Cinar,et al.  Neural Networks: A New Tool for Predicting Thrift Failures , 1992 .

[14]  Xin Yao,et al.  A constructive algorithm for training cooperative neural network ensembles , 2003, IEEE Trans. Neural Networks.

[15]  Thomas E. McKee Rough sets bankruptcy prediction models versus auditor signalling rates , 2003 .

[16]  Nan-Chen Hsieh,et al.  Hybrid mining approach in the design of credit scoring models , 2005, Expert Syst. Appl..

[17]  Chih-Fong Tsai,et al.  Using neural network ensembles for bankruptcy prediction and credit scoring , 2008, Expert Syst. Appl..

[18]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[19]  Prakash P. Shenoy,et al.  Using Bayesian networks for bankruptcy prediction: Some methodological issues , 2007, Eur. J. Oper. Res..

[20]  Ana I. González Acuña An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization , 2012 .

[21]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[23]  E. Lawrence,et al.  A Multinomial Logit Analysis of Problem Loan Resolution Choices in Banking , 1995 .

[24]  So Young Sohn,et al.  Managing loan customers using misclassification patterns of credit scoring model , 2004, Expert Syst. Appl..

[25]  Mu-Chen Chen,et al.  Credit scoring and rejected instances reassigning through evolutionary computation techniques , 2003, Expert Syst. Appl..

[26]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[27]  Edward I. Altman,et al.  Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience) , 1994 .

[28]  Sumit Sarkar,et al.  Bayesian Models for Early Warning of Bank Failures , 2001, Manag. Sci..

[29]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[30]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[31]  Edward I. Altman,et al.  FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND THE PREDICTION OF CORPORATE BANKRUPTCY , 1968 .

[32]  Jean-François Boulicaut,et al.  Iterative Bayesian Network Implementation by Using Annotated Association Rules , 2006, EKAW.

[33]  David West,et al.  Neural network ensemble strategies for financial decision applications , 2005, Comput. Oper. Res..

[34]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[35]  Vijay S. Desai,et al.  A comparison of neural networks and linear scoring models in the credit union environment , 1996 .

[36]  Youngohc Yoon,et al.  Applying Artificial Neural Networks to Investment Analysis , 1992 .

[37]  C. Charalambous,et al.  Predicting Corporate Failure: Empirical Evidence for the UK by , 2001 .

[38]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.