Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment

The databases of the banks around the world have accumulated large quantities of information about clients and their financial and payment history. These databases can be used for the credit risk assessment, but they are commonly high dimensional. Irrelevant features in a training dataset may produce less accurate results of classification analysis. Data preprocessing is required to prepare the data for classification to increase the predictive accuracy. Feature selection is a preprocessing technique commonly used on high dimensional data and its purposes include reducing dimensionality, removing irrelevant and redundant features, facilitating data understanding, reducing the amount of data needed for learning, improving predictive accuracy of algorithms, and increasing interpretability of models. In this paper we investigate the extent to which the total data, owned by a bank, can be a good basis for predicting the borrower's ability to repay the loan on time. We propose a feature selection technique for finding an optimum feature subset that enhances the classification accuracy of neural network classifiers. Experiments were conducted on the credit dataset collected at a Croatian bank to assess the accuracy of our technique. We found that the hybrid system with genetic algorithm is competitive and can be used as feature selection technique to discover the most significant features in determining risk of default.

[1]  Steven Finlay,et al.  Multiple classifier architectures and their application to credit risk assessment , 2011, Eur. J. Oper. Res..

[2]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[3]  Shu-Ping Lin,et al.  The consumer loan default predicting model - An application of DEA-DA and neural network , 2009, Expert Syst. Appl..

[4]  Lyn C. Thomas,et al.  Structural Models in Consumer Credit , 2004, Eur. J. Oper. Res..

[5]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[6]  Jonathan N. Crook,et al.  Recent developments in consumer credit risk assessment , 2007, Eur. J. Oper. Res..

[7]  Jure Zupan,et al.  Consumer Credit Scoring Models with Limited Data , 2007, Expert Syst. Appl..

[8]  Michael Y. Hu,et al.  Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis , 1999, Eur. J. Oper. Res..

[9]  Sangkyun Lee,et al.  Feature Selection for High-Dimensional Data with RapidMiner , 2012 .

[10]  Marin Golub Poboljšavanje djelotvornosti paralelnih genetskih algoritama , 2001 .

[11]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[12]  Bhekisipho Twala,et al.  Multiple classifier application to credit risk assessment , 2010, Expert Syst. Appl..

[13]  Steven Finlay,et al.  Credit scoring for profitability objectives , 2010, Eur. J. Oper. Res..

[14]  Marijana Zekić-Sušac,et al.  Selecting neural network architecture for investment profitability predictions , 2005 .

[15]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[16]  Sheng-Tun Li,et al.  The evaluation of consumer loans using support vector machines , 2006, Expert Syst. Appl..

[17]  M. Zekic-Susac,et al.  Small business credit scoring: a comparison of logistic regression, neural network, and decision tree models , 2004, 26th International Conference on Information Technology Interfaces, 2004..

[18]  Adnan Khashman,et al.  Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes , 2010, Expert Syst. Appl..

[19]  Marijana Zekic-Susac,et al.  Comparison procedure of predicting the time to default in behavioural scoring , 2009, Expert Syst. Appl..

[20]  R. Malhotra,et al.  Evaluating Consumer Loans Using Neural Networks , 2001 .