Mixed-Integer Support Vector Machine

In this paper, we propose a formulation of a feature selecting support vector machine based on the L0-norm. We explore a perspective relaxation of the optimization problem and solve it using mixed-integer nonlinear programming (MINLP) techniques. Given a training set of labeled data instances, we construct a maxmargin classifier that minimizes the hinge loss as well as the cardinality of the weight vector of the separating hyperplane || w ||0, effectively performing feature selection and classification simultaneously in one optimization. We compare this relaxation with the standard SVM, recursive feature elimination (RFE), L1-norm SVM, and two approximated L0-norm SVM methods, and show promising results on real-world datasets in terms of accuracy and sparsity.

[1]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[2]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Gérard Cornuéjols,et al.  An algorithmic framework for convex mixed integer nonlinear programs , 2008, Discret. Optim..

[8]  Oktay Günlük,et al.  Perspective Relaxation of Mixed Integer Nonlinear Programs with Indicator Variables , 2008, IPCO.

[9]  Le Thi Hoai An,et al.  A DC programming approach for feature selection in support vector machines learning , 2008, Adv. Data Anal. Classif..

[10]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[13]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[14]  Nuno Vasconcelos,et al.  Direct convex relaxations of sparse SVM , 2007, ICML '07.

[15]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[16]  Ron Meir,et al.  A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound , 2004, NIPS.

[17]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[18]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[19]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[20]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[21]  Alexander G. Gray,et al.  Rapid Mass Spectrometric Metabolic Profiling of Blood Sera Detects Ovarian Cancer with High Accuracy , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[22]  Olvi L. Mangasarian,et al.  Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[23]  Alexander G. Gray,et al.  Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines , 2009, BMC Bioinformatics.

[24]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .