Data pre-processing by genetic algorithms for bankruptcy prediction

Bankruptcy prediction has been approached by data mining techniques. However, since data pre-processing including feature selection or dimensionality reduction and data reduction is a very important stage for successful data mining, very few consider performing both tasks to examine the impact of data pre-processing on prediction performance. This paper applies genetic algorithms, which have been widely used for the data pre-processing tasks, for feature selection and data reduction over a public bankruptcy prediction dataset. In particular, the experiments based on different priorities of performing feature selection and data reduction are conducted. The results show that performing data reduction only can allow the support vector machine (SVM) classifier to provide the highest rate of prediction accuracy. However, executing both feature selection and data reduction with different priorities performs the same. They not only largely reduce the dataset size, but also keep the similar performance as SVM without data pre-processing.

[1]  Chuang Lin,et al.  On sensitivity of case-based reasoning to optimal feature subsets in business failure prediction , 2010, Expert Syst. Appl..

[2]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[3]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[4]  Y. Liu,et al.  Data mining feature selection for credit scoring models , 2005, J. Oper. Res. Soc..

[5]  Chih-Hung Wu,et al.  Developing a business failure prediction model via RST, GRA and CBR , 2009, Expert Syst. Appl..

[6]  Vadlamani Ravi,et al.  Failure prediction of dotcom companies using neural network-genetic programming hybrids , 2010, Inf. Sci..

[7]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[8]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[11]  Francisco Herrera,et al.  A Survey on Evolutionary Instance Selection and Generation , 2010, Int. J. Appl. Metaheuristic Comput..

[12]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[13]  Vadlamani Ravi,et al.  Bankruptcy prediction in banks and firms via statistical and intelligent techniques - A review , 2007, Eur. J. Oper. Res..

[14]  Vadlamani Ravi,et al.  Financial distress prediction in banks using Group Method of Data Handling neural network, counter propagation neural network and fuzzy ARTMAP , 2010, Knowl. Based Syst..

[15]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[16]  Chih-Fong Tsai,et al.  Feature selection in bankruptcy prediction , 2009, Knowl. Based Syst..

[17]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[19]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[20]  Soushan Wu,et al.  Credit rating analysis with support vector machines and neural networks: a market comparative study , 2004, Decis. Support Syst..

[21]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[22]  Michael T. Manry,et al.  Feature Selection Using a Piecewise Linear Network , 2006, IEEE Transactions on Neural Networks.

[23]  Chih-Fong Tsai Financial decision support using neural networks and support vector machines , 2008, Expert Syst. J. Knowl. Eng..

[24]  Sven F. Crone,et al.  The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing , 2006, Eur. J. Oper. Res..