Statistics-based wrapper for feature selection: An implementation on financial distress identification with support vector machine

Abstract Support vector machine (SVM) is an effective tool for financial distress identification (FDI). However, a potential issue that keeps SVM from being efficiently applied in identifying financial distress is how to select features in SVM-based FDI. Although filters are commonly employed, yet this type of approach does not consider predictive capability of SVM itself when selecting features. This research devotes to constructing a statistics-based wrapper for SVM-based FDI by using statistical indices of ranking-order information from predictive performances on various parameters. This wrapper consists of four levels, i.e., data level, model level based on SVM, feature ranking-order level, and the index level of feature selection. When data is ready, predictive accuracies of a type of SVM model, i.e., linear SVM (LSVM), polynomial SVM (PSVM), Gaussian SVM (GSVM), or sigmoid SVM (SSVM), on various pairs of parameters are firstly calculated. Then, performances of SVM models on each candidate feature are transferred to be ranking-order indices. After this step, the two statistical indices of mean and standard deviation values are calculated from ranking-order information on each feature. Finally, the feature selection indices of SVM are produced by a combination of statistical indices. Each feature with its feature selection index being smaller than half of the average index is selected to compose the optimal feature set. With a dataset collected for Chinese FDI prior to 3 years, we statistically verified the performance of this statistics-based wrapper against a non-statistics-based wrapper, two filters, and non-feature selection for SVM-based FDI. Results from unseen dataset indicate that GSVM with the statistics-based wrapper significantly outperformed the other SVM models on the other feature selection methods and two wrapper-based classical statistical models.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Toshiyuki Sueyoshi,et al.  DEA as a tool for bankruptcy assessment: A comparative study with logistic regression technique , 2009, Eur. J. Oper. Res..

[3]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[4]  Parag C. Pendharkar,et al.  A threshold-varying artificial neural network approach for classification and its application to bankruptcy prediction problem , 2005, Comput. Oper. Res..

[5]  Ingoo Han,et al.  Bankruptcy prediction using case-based reasoning, neural networks, and discriminant analysis , 1997 .

[6]  Chih-Hung Wu,et al.  Developing a business failure prediction model via RST, GRA and CBR , 2009, Expert Syst. Appl..

[7]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[8]  Yongsheng Ding,et al.  Forecasting financial condition of Chinese listed companies based on support vector machine , 2008, Expert Syst. Appl..

[9]  Zhongsheng Hua,et al.  Predicting corporate financial distress based on integration of support vector machine and logistic regression , 2007, Expert Syst. Appl..

[10]  Constantin Zopounidis,et al.  Business failure prediction using rough sets , 1999, Eur. J. Oper. Res..

[11]  Sumit Sarkar,et al.  Bayesian Models for Early Warning of Bank Failures , 2001, Manag. Sci..

[12]  E. Laitinen,et al.  Bankruptcy prediction: Application of the Taylor's expansion in logistic regression , 2000 .

[13]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[14]  D. Hensher,et al.  Predicting Firm Financial Distress: A Mixed Logit Model , 2004 .

[15]  Robert W. Ingram,et al.  Tests of the generalizability of Altman's bankruptcy prediction model , 2001 .

[16]  Melody Y. Kiang,et al.  Managerial Applications of Neural Networks: The Case of Bank Failure Predictions , 1992 .

[17]  W. Beaver Financial Ratios As Predictors Of Failure , 1966 .

[18]  Stewart Jones,et al.  An Error Component Logit Analysis of Corporate Bankruptcy and Insolvency Risk in Australia , 2007 .

[19]  Jie Sun,et al.  An Application of Support Vector Machine to Companies' Financial Distress Prediction , 2006, MDAI.

[20]  Hui Li,et al.  Financial distress early warning based on group decision making , 2009, Comput. Oper. Res..

[21]  Amir F. Atiya,et al.  Bankruptcy prediction for credit risk using neural networks: A survey and new results , 2001, IEEE Trans. Neural Networks.

[22]  Hui Li,et al.  Ranking-order case-based reasoning for financial distress prediction , 2008, Knowl. Based Syst..

[23]  Thomas E. McKee Rough sets bankruptcy prediction models versus auditor signalling rates , 2003 .

[24]  Hui Li,et al.  Dynamic financial distress prediction using instance selection for the disposal of concept drift , 2011, Expert Syst. Appl..

[25]  Hui Li,et al.  Predicting business failure using multiple case-based reasoning combined with support vector machine , 2009, Expert Syst. Appl..

[26]  James A. Ohlson FINANCIAL RATIOS AND THE PROBABILISTIC PREDICTION OF BANKRUPTCY , 1980 .

[27]  Chih-Fong Tsai Financial decision support using neural networks and support vector machines , 2008, Expert Syst. J. Knowl. Eng..

[28]  Yi-Chung Hu,et al.  Incorporating a non-additive decision making method into multi-layer neural networks and its application to financial distress analysis , 2008, Knowl. Based Syst..

[29]  Daniel Martin,et al.  Early warning of bank failure: A logit regression approach , 1977 .

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  Young U. Ryu,et al.  Firm bankruptcy prediction: experimental comparison of isotonic separation and other classification approaches , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[32]  Indranil Bose,et al.  Deciding the financial health of dot-coms using rough sets , 2006, Inf. Manag..

[33]  Edward I. Altman,et al.  FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND THE PREDICTION OF CORPORATE BANKRUPTCY , 1968 .

[34]  Stewart Jones,et al.  Evaluating the Behavioral Performance of Alternative Logit Models: An Application to Corporate Takeovers Research , 2007 .

[35]  Chih-Fong Tsai,et al.  Using neural network ensembles for bankruptcy prediction and credit scoring , 2008, Expert Syst. Appl..

[36]  Kyung-shik Shin,et al.  An application of support vector machines in bankruptcy prediction model , 2005, Expert Syst. Appl..

[37]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[38]  Vadlamani Ravi,et al.  Soft computing system for bank performance prediction , 2008, Appl. Soft Comput..

[39]  Chih-Hung Wu,et al.  A real-valued genetic algorithm to optimize the parameters of support vector machine for predicting bankruptcy , 2007, Expert Syst. Appl..

[40]  Fang-Mei Tseng,et al.  A quadratic interval logit model for forecasting bankruptcy , 2005 .

[41]  Prakash P. Shenoy,et al.  Using Bayesian networks for bankruptcy prediction: Some methodological issues , 2007, Eur. J. Oper. Res..

[42]  Young-Chan Lee,et al.  Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters , 2005, Expert Syst. Appl..

[43]  Jack C. Lee,et al.  A semiparametric method for predicting bankruptcy , 2007 .

[44]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[45]  Haibo He,et al.  Self-organizing learning array and its application to economic and financial problems , 2007, Inf. Sci..

[46]  Roberto Kawakami Harrop Galvão,et al.  Neural and Wavelet Network Models for Financial Distress Classification , 2005, Data Mining and Knowledge Discovery.

[47]  Ingoo Han,et al.  Hybrid genetic algorithms and support vector machines for bankruptcy prediction , 2006, Expert Syst. Appl..

[48]  Kyoung-jae Kim,et al.  Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach , 2009, Appl. Soft Comput..