Feature selection for support vector regression using probabilistic prediction

This paper presents a novel wrapper-based feature selection method for Support Vector Regression (SVR) using its probabilistic predictions. The method computes the importance of a feature by aggregating the difference, over the feature space, of the conditional density functions of the SVR prediction with and without the feature. As the exact computation of this importance measure is expensive, two approximations are proposed. The effectiveness of the measure using these approximations, in comparison to several other existing feature selection methods for SVR, is evaluated on both artificial and real-world problems. The result of the experiment shows that the proposed method generally performs better, and at least as well as the existing methods, with notable advantage when the data set is sparse.

[1]  Junbin Gao,et al.  A Probabilistic Framework for SVM Regression and Error Bar Estimation , 2002, Machine Learning.

[2]  Yongdai Kim,et al.  Gradient LASSO for feature selection , 2004, ICML.

[3]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[4]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[7]  Chong Jin Ong,et al.  Feature selection via sensitivity analysis of SVM probabilistic outputs , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[8]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Naftali Tishby,et al.  Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity , 2005, NIPS.

[12]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[15]  J. Brezmes,et al.  Variable selection for support vector machine based multisensor systems , 2007 .

[16]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[17]  Jian-Bo Yang,et al.  Feature Selection for MLP Neural Network: The Use of Random Permutation of Probabilistic Outputs , 2009, IEEE Transactions on Neural Networks.

[18]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[19]  Chih-Jen Lin,et al.  Simple Probabilistic Predictions for Support Vector Regression , 2004 .

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[23]  James T. Kwok,et al.  Bayesian Support Vector Regression , 2001, AISTATS.

[24]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[25]  Wei Chu,et al.  Bayesian support vector regression using a unified loss function , 2004, IEEE Transactions on Neural Networks.

[26]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..