A New Perspective on Convex Relaxations of Sparse SVM

This paper proposes a convex relaxation of a sparse support vector machine (SVM) based on the perspective relaxation of mixed-integer nonlinear programs. We seek to minimize the zero-norm of the hyperplane normal vector with a standard SVM hinge-loss penalty and extend our approach to a zeroone loss penalty. The relaxation that we propose is a second-order cone formulation that can be efficiently solved by standard conic optimization solvers. We compare the optimization properties and classification performance of the second-order cone formulation with previous sparse SVM formulations suggested in the literature.

[1]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[2]  Aharon Ben-Tal,et al.  Lectures on modern convex optimization , 1987 .

[3]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[4]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[5]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[6]  Stephen P. Boyd,et al.  Recent Advances in Learning and Control , 2008, Lecture Notes in Control and Information Sciences.

[7]  Ivor W. Tsang,et al.  Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets , 2010, ICML.

[8]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[9]  Robert H. Sloan,et al.  Proceedings of the 15th Annual Conference on Computational Learning Theory , 2002 .

[10]  Kim-Chuan Toh,et al.  A Newton-CG Augmented Lagrangian Method for Semidefinite Programming , 2010, SIAM J. Optim..

[11]  Michael C. Ferris,et al.  Semismooth support vector machines , 2004, Math. Program..

[12]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  Noam Goldberg,et al.  Sparse weighted voting classifier selection and its linear programming relaxations , 2012, Inf. Process. Lett..

[15]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[16]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[17]  Noam Goldberg,et al.  Boosting Classifiers with Tightened L0-Relaxation Penalties , 2010, ICML.

[18]  Kristin P. Bennett,et al.  A Parametric Optimization Method for Machine Learning , 1997, INFORMS J. Comput..

[19]  V. Koltchinskii,et al.  Complexities of convex combinations and bounding the generalization error in classification , 2004, math/0405356.

[20]  Oktay Günlük,et al.  Perspective reformulations of mixed integer nonlinear programs with indicator variables , 2010, Math. Program..

[21]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[22]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[23]  Nuno Vasconcelos,et al.  Direct convex relaxations of sparse SVM , 2007, ICML '07.

[24]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[25]  John Langford,et al.  PAC-MDL Bounds , 2003, COLT.