Feature Selection Using SVM Probabilistic Outputs

A ranking criterion based on the posterior probability is proposed for feature selection on support vector machines (SVM). This criterion has the advantage that it is directly related to the importance of the features. Four approximations are proposed for the evaluation of this criterion. The performances of these approximations, used in the recursive feature elimination (RFE) approach, are evaluated on various artificial and real-world problems. Three of the proposed approximations show good performances consistently, with one having a slight edge over the other two. Their performances compare favorably with feature selection methods in the literature.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[4]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[11]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[12]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[13]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[14]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[18]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[19]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .