A comparative study of saliency analysis and genetic algorithm for feature selection in support vector machines

Recently, support vector machine (SVM) has been receiving increasing attention in the field of regression estimation due to its remarkable characteristics such as good generalization performance, the absence of local minima and sparse representation of the solution. However, within the SVMs framework, there are very few established approaches for identifying important features. Selecting significant features from all candidate features is the first step in regression estimation, and this procedure can improve the network performance, reduce the network complexity, and speed up the training of the network. This paper investigates the use of saliency analysis (SA) and genetic algorithm (GA) in SVMs for selecting important features in the context of regression estimation. The SA measures the importance of features by evaluating the sensitivity of the network output with respect to the feature input. The derivation of the sensitivity of the network output to the feature input in terms of the partial derivative in SVMs is presented, and a systematic approach to remove irrelevant features based on the sensitivity is developed. GA is an efficient search method based on the mechanics of natural selection and population genetics. A simple GA is used where all features are mapped into binary chromosomes with a bit "1" representing the inclusion of the feature and a bit of "0" representing the absence of the feature. The performances of SA and GA are tested using two simulated non-linear time series and five real financial time series. The experiments show that with the simulated data, GA and SA detect the same true feature set from the redundant feature set, and the method of SA is also insensitive to the kernel function selection. With the real financial data, GA and SA select different subsets of features. Both selected feature sets achieve higher generation performance in SVMs than that of the full feature set. In addition, the generation performance between the selected feature sets of GA and SA is similar. All the results demonstrate that that both SA and GA are effective in SVMs for identifying important features.

[1]  Kaisa Sere,et al.  Neural networks and genetic algorithms for bankruptcy predictions , 1996 .

[2]  Kenneth W. Bauer,et al.  Feature saliency measures , 1997 .

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Donald E. Brown,et al.  Fast generic selection of features for neural network classifiers , 1992, IEEE Trans. Neural Networks.

[5]  Muni S. Srivastava,et al.  Regression Analysis: Theory, Methods, and Applications , 1991 .

[6]  Lieva Van Langenhove,et al.  Optimising a production process by a neural network/genetic algorithm approach , 1996 .

[7]  Aly A. Farag,et al.  Application of neural networks and genetic algorithms in the classification of endothelial cells , 1997, Pattern Recognit. Lett..

[8]  Andrew Hunter,et al.  Selecting features in neurofuzzy modelling by multiobjective genetic algorithms , 1999 .

[9]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[10]  R. Rosipal,et al.  An adaptive support vector regression filter: A signal detection application , 1999 .

[11]  Wright-Patterson Afb,et al.  Feature Selection Using a Multilayer Perceptron , 1990 .

[12]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[13]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognit. Lett..

[14]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[15]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[16]  Jacek M. Zurada,et al.  Perturbation method for deleting redundant inputs of perceptron networks , 1997, Neurocomputing.

[17]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[18]  Patrick Gallinari,et al.  Variable selection with neural networks , 1996, Neurocomputing.

[19]  Kenneth W. Bauer,et al.  Determining input features for multilayer perceptrons , 1995, Neurocomputing.

[20]  William H. Murray,et al.  Microsoft C/C++ 7: The Complete Reference , 1992 .

[21]  James D. McCalley,et al.  Power system security boundary visualization using neural networks , 1998, Neurocomputing.

[22]  F. Girosi,et al.  Nonlinear prediction of chaotic time series using support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[23]  Douglas C. Montgomery,et al.  Applied Statistics and Probability for Engineers, Third edition , 1994 .

[24]  D. Bertrand,et al.  Feature selection by a genetic algorithm. Application to seed discrimination by artificial vision , 1998 .

[25]  Victor L. Brailovsky,et al.  On domain knowledge and feature selection using a support vector machine , 1999, Pattern Recognit. Lett..

[26]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[27]  Lakhmi C. Jain,et al.  Nearest neighbor classifier: Simultaneous editing and feature selection , 1999, Pattern Recognit. Lett..

[28]  Elizabeth A. Peck,et al.  Introduction to Linear Regression Analysis , 2001 .