Direct convex relaxations of sparse SVM

Although support vector machines (SVMs) for binary classification give rise to a decision rule that only relies on a subset of the training data points (support vectors), it will in general be based on all available features in the input space. We propose two direct, novel convex relaxations of a non-convex sparse SVM formulation that explicitly constrains the cardinality of the vector of feature weights. One relaxation results in a quadratically-constrained quadratic program (QCQP), while the second is based on a semidefinite programming (SDP) relaxation. The QCQP formulation can be interpreted as applying an adaptive soft-threshold on the SVM hyperplane, while the SDP formulation learns a weighted inner-product (i.e. a kernel) that results in a sparse hyperplane. Experimental results show an increase in sparsity while conserving the generalization performance compared to a standard as well as a linear programming SVM.

[1]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[2]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[3]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[4]  B. Borchers CSDP, A C library for semidefinite programming , 1999 .

[5]  Michael I. Jordan,et al.  Simultaneous Relevant Feature Identification and Classification in High-Dimensional Spaces , 2002, WABI.

[6]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[7]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[8]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[9]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[10]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[11]  Nuno Vasconcelos,et al.  Duals of the QCQP and SDP Sparse SVM , 2007 .

[12]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[13]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[16]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[17]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[18]  C. Lemaréchal,et al.  Semidefinite Relaxations and Lagrangian Duality with Application to Combinatorial Optimization , 1999 .

[19]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[20]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[21]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[22]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[23]  Ron Meir,et al.  A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound , 2004, NIPS.