Feature Selection Using Probabilistic Prediction of Support Vector Regression

This paper presents a new wrapper-based feature selection method for support vector regression (SVR) using its probabilistic predictions. The method computes the importance of a feature by aggregating the difference, over the feature space, of the conditional density functions of the SVR prediction with and without the feature. As the exact computation of this importance measure is expensive, two approximations are proposed. The effectiveness of the measure using these approximations, in comparison to several other existing feature selection methods for SVR, is evaluated on both artificial and real-world problems. The result of the experiments show that the proposed method generally performs better than, or at least as well as, the existing methods, with notable advantage when the dataset is sparse.

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[5]  Wei Chu,et al.  Bayesian support vector regression using a unified loss function , 2004, IEEE Transactions on Neural Networks.

[6]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[7]  Chong Jin Ong,et al.  Feature selection via sensitivity analysis of SVM probabilistic outputs , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[8]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[9]  James T. Kwok,et al.  Applying the Bayesian Evidence Framework to \nu -Support Vector Regression , 2001, ECML.

[10]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[11]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[12]  Chih-Jen Lin,et al.  Simple Probabilistic Predictions for Support Vector Regression , 2004 .

[13]  Junbin Gao,et al.  A Probabilistic Framework for SVM Regression and Error Bar Estimation , 2002, Machine Learning.

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[16]  Alain Rakotomamonjy,et al.  Analysis of SVM regression bounds for variable ranking , 2007, Neurocomputing.

[17]  E. S. Page Errata: A Note on Generating Random Permutations , 1968 .

[18]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[19]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[20]  Jian-Bo Yang,et al.  Feature Selection for MLP Neural Network: The Use of Random Permutation of Probabilistic Outputs , 2009, IEEE Transactions on Neural Networks.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[23]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[24]  James T. Kwok,et al.  Bayesian Support Vector Regression , 2001, AISTATS.

[25]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[26]  Vitor Hugo Ferreira,et al.  Input space to neural network based load forecasters , 2008 .

[27]  J. Friedman Multivariate adaptive regression splines , 1990 .