Expert Systems With Applications

Abstract In the last years, the application of artificial intelligence methods on credit risk assessment has meant an improvement over classic methods. Small improvements in the systems about credit scoring and bankruptcy prediction can suppose great profits. Then, any improvement represents a high interest to banks and financial institutions. Recent works show that ensembles of classifiers achieve the better results for this kind of tasks. In this paper, it is extended a previous work about the selection of the best base classifier used in ensembles on credit data sets. It is shown that a very simple base classifier, based on imprecise probabilities and uncertainty measures, attains a better trade-off among some aspects of interest for this type of studies such as accuracy and area under ROC curve (AUC). The AUC measure can be considered as a more appropriate measure in this grounds, where the different type of errors have different costs or consequences. The results shown here present to this simple classifier as an interesting choice to be used as base classifier in ensembles for credit scoring and bankruptcy prediction, proving that not only the individual performance of a classifier is the key point to be selected for an ensemble scheme.

[1]  George J. Klir,et al.  Disaggregated total uncertainty measure for credal sets , 2006, Int. J. Gen. Syst..

[2]  Jakub M. Tomczak,et al.  Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction , 2016, Expert Syst. Appl..

[3]  Bee Wah Yap,et al.  Using data mining to improve assessment of credit worthiness via credit scoring models , 2011, Expert Syst. Appl..

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  Anderson Ara,et al.  Classification methods applied to credit scoring: A systematic review and overall comparison , 2016, 1602.02137.

[7]  Raymond J. Mooney,et al.  Constructing Diverse Classifier Ensembles using Artificial Training Examples , 2003, IJCAI.

[8]  Serafín Moral,et al.  Upper entropy of credal sets. Applications to credal classification , 2005, Int. J. Approx. Reason..

[9]  J R Beck,et al.  The use of relative operating characteristic (ROC) curves in test performance evaluation. , 1986, Archives of pathology & laboratory medicine.

[10]  Luiz Eduardo Soares de Oliveira,et al.  Dynamic selection of classifiers - A comprehensive review , 2014, Pattern Recognit..

[11]  P. Walley Inferences from Multinomial Data: Learning About a Bag of Marbles , 1996 .

[12]  Maysam F. Abbod,et al.  Classifiers consensus system approach for credit scoring , 2016, Knowl. Based Syst..

[13]  H Zhu,et al.  A Bayesian framework for the combination of classifier outputs , 2002, J. Oper. Res. Soc..

[14]  G. Klir Uncertainty and Information: Foundations of Generalized Information Theory , 2005 .

[15]  Anil K. Kashyap,et al.  The 2007-8 financial crisis: Lessons from corporate finance , 2010 .

[16]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[19]  Raymond J. Mooney,et al.  Creating diversity in ensembles using artificial data , 2005, Inf. Fusion.

[20]  Mykola Pechenizkiy,et al.  Diversity in search strategies for ensemble feature selection , 2005, Inf. Fusion.

[21]  D. Hand,et al.  A k-nearest-neighbour classifier for assessing consumer credit risk , 1996 .

[22]  Terry Harris,et al.  Credit scoring using the clustered support vector machine , 2015, Expert Syst. Appl..

[23]  W. Pietruszkiewicz,et al.  Dynamical systems and nonlinear Kalman filtering applied in classification , 2008, 2008 7th IEEE International Conference on Cybernetic Intelligent Systems.

[24]  Jakub M. Tomczak,et al.  Classification Restricted Boltzmann Machine for comprehensible credit scoring model , 2015, Expert Syst. Appl..

[25]  Lyn C. Thomas,et al.  Does segmentation always improve model performance in credit scoring? , 2012, Expert Syst. Appl..

[26]  Serafín Moral,et al.  Building classification trees using the total uncertainty criterion , 2003, Int. J. Intell. Syst..

[27]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[28]  Jian Ma,et al.  Two credit scoring models based on dual strategy ensemble trees , 2012, Knowl. Based Syst..

[29]  Wei-Wen Wu,et al.  Improving Classification Accuracy and Causal Knowledge for Better Credit Decisions , 2011, Int. J. Neural Syst..

[30]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[31]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[32]  Mohammad Siami,et al.  Credit scoring in banks and financial institutions via data mining techniques: A literature review , 2013 .

[33]  Joaquín Abellán,et al.  Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data , 2014, Expert Syst. Appl..

[34]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[35]  P. Temin The Great Recession & the Great Depression , 2010, Daedalus.

[36]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[37]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[38]  Chihli Hung,et al.  A selective ensemble based on expected probabilities for bankruptcy prediction , 2009, Expert Syst. Appl..

[39]  Joaquín Abellán,et al.  Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring , 2014, Expert Syst. Appl..

[40]  David J. Hand,et al.  Statistical Classification Methods in Consumer Credit Scoring: a Review , 1997 .

[41]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[42]  Lin Ma,et al.  Mining the customer credit using hybrid support vector machine technique , 2009, Expert Syst. Appl..

[43]  Manoj Kumar Tiwari,et al.  Computational time reduction for credit scoring: An integrated approach based on support vector machine and stratified sampling method , 2012, Expert Syst. Appl..

[44]  H. Sabzevari,et al.  A comparison between statistical and Data Mining methods for credit scoring in case of limited available data , 2007 .

[45]  Ralf Stecking,et al.  Support vector machines for classifying and describing credit applicants: detecting typical and critical regions , 2005, J. Oper. Res. Soc..

[46]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[47]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[48]  Tian-Shyug Lee,et al.  A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines , 2005, Expert Syst. Appl..

[49]  F. Longstaff The subprime credit crisis and contagion in financial markets , 2010 .

[50]  José Salvador Sánchez,et al.  An insight into the experimental design for credit risk and corporate bankruptcy prediction systems , 2014, Journal of Intelligent Information Systems.

[51]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[52]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[53]  Barry Eichengreen,et al.  From Great Depression to Great Credit Crisis: Similarities, Differences and Lessons , 2009 .

[54]  Yu Wang,et al.  Ensemble classification based on supervised clustering for credit scoring , 2016, Appl. Soft Comput..

[55]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[56]  A. I. Marqués,et al.  Exploring the behaviour of base classifiers in credit scoring ensembles , 2012, Expert Syst. Appl..

[57]  Edward I. Altman,et al.  Managing Credit Risk: The Great Challenge for the Global Financial Markets , 2008 .

[58]  Loris Nanni,et al.  An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring , 2009, Expert Syst. Appl..

[59]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[60]  Byeong Ho Kang,et al.  Investigation and improvement of multi-layer perception neural networks for credit scoring , 2015, Expert Syst. Appl..

[61]  K. K. Jain,et al.  Neural network credit scoring model for micro enterprise financing in India , 2011 .

[62]  Joaquín Abellán,et al.  Uncertainty measures on probability intervals from the imprecise Dirichlet model , 2006, Int. J. Gen. Syst..

[63]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[66]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.