论文信息 - Direct convex relaxations of sparse SVM

Direct convex relaxations of sparse SVM

Although support vector machines (SVMs) for binary classification give rise to a decision rule that only relies on a subset of the training data points (support vectors), it will in general be based on all available features in the input space. We propose two direct, novel convex relaxations of a non-convex sparse SVM formulation that explicitly constrains the cardinality of the vector of feature weights. One relaxation results in a quadratically-constrained quadratic program (QCQP), while the second is based on a semidefinite programming (SDP) relaxation. The QCQP formulation can be interpreted as applying an adaptive soft-threshold on the SVM hyperplane, while the SDP formulation learns a weighted inner-product (i.e. a kernel) that results in a sparse hyperplane. Experimental results show an increase in sparsity while conserving the generalization performance compared to a standard as well as a linear programming SVM.

Nuno Vasconcelos | Antoni B. Chan | Gert R. G. Lanckriet | G. Lanckriet | N. Vasconcelos

[1] Jinbo Bi,et al. Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[2] Pat Langley,et al. Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[3] Yves Grandvalet,et al. Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[4] B. Borchers. CSDP, A C library for semidefinite programming , 1999 .

[5] Michael I. Jordan,et al. Simultaneous Relevant Feature Identification and Classification in High-Dimensional Spaces , 2002, WABI.

[6] Sayan Mukherjee,et al. Feature Selection for SVMs , 2000, NIPS.

[7] Robert Tibshirani,et al. 1-norm Support Vector Machines , 2003, NIPS.

[8] Glenn Fung,et al. A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[9] O. Mangasarian,et al. Massive data discrimination via linear support vector machines , 2000 .

[10] Bernhard Schölkopf,et al. Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[11] Nuno Vasconcelos,et al. Duals of the QCQP and SDP Sparse SVM , 2007 .

[12] Alain Rakotomamonjy,et al. Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[13] D. J. Newman,et al. UCI Repository of Machine Learning Database , 1998 .

[14] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15] O. Mangasarian,et al. Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[16] Gabriele Steidl,et al. Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[17] Pat Langley,et al. Editorial: On Machine Learning , 1986, Machine Learning.

[18] C. Lemaréchal,et al. Semidefinite Relaxations and Lagrangian Duality with Application to Combinatorial Optimization , 1999 .

[19] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[20] Jos F. Sturm,et al. A Matlab toolbox for optimization over symmetric cones , 1999 .

[21] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[22] Paul S. Bradley,et al. Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[23] Ron Meir,et al. A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound , 2004, NIPS.