Feature selection in SVM via polyhedral k-norm

We treat the Feature Selection problem in the Support Vector Machine (SVM) framework by adopting an optimization model based on use of the L-0 pseudo-norm. The objective is to control the number of non-zero components of normal vector to the separating hyperplane, while maintaining satisfactory classification accuracy. In our model the polyhedral k-norm , intermediate between L-1 and L-∞ norms, plays a significant role, allowing us to come out with a DC (Difference of Convex) optimization problem that is tackled by means of DCA algorithm. The results of several numerical experiments on benchmark classification datasets are reported.

[1]  Martin J. Wainwright,et al.  Sparse learning via Boolean relaxations , 2015, Mathematical Programming.

[2]  Alexander S. Strekalovsky Global Optimality Conditions for Nonconvex Optimization , 1998, J. Glob. Optim..

[3]  Giovanna Miglionico,et al.  Minimizing Piecewise-Concave Functions Over Polyhedra , 2018, Math. Oper. Res..

[4]  Richard Weber,et al.  Feature selection for Support Vector Machines via Mixed Integer Linear Programming , 2014, Inf. Sci..

[5]  Akiko Takeda,et al.  DC formulations and algorithms for sparse optimization problems , 2017, Mathematical Programming.

[6]  L. Grippo,et al.  Exact penalty functions in constrained optimization , 1989 .

[7]  Olvi L. Mangasarian,et al.  Nonlinear Programming , 1969 .

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  Giovanni Felici,et al.  Integer programming models for feature selection: New extensions and a randomized solution algorithm , 2016, Eur. J. Oper. Res..

[10]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[11]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[12]  Stephen J. Wright Accelerated Block-coordinate Relaxation for Regularized Optimization , 2012, SIAM J. Optim..

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  G. Alistair Watson Linear best approximation using a class of polyhedral norms , 2005, Numerical Algorithms.

[15]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[16]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[17]  Adil M. Bagirov,et al.  Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations , 2018, J. Glob. Optim..

[18]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Le Thi Hoai An,et al.  The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems , 2005, Ann. Oper. Res..

[21]  Kim-Chuan Toh,et al.  On the Moreau-Yosida Regularization of the Vector k-Norm Related Functions , 2014, SIAM J. Optim..

[22]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[23]  Gilles Aubert,et al.  A Unified View of Exact Continuous Penalties for ℓ2-ℓ0 Minimization , 2017, SIAM J. Optim..

[24]  J. Hiriart-Urruty Generalized Differentiability / Duality and Optimization for Problems Dealing with Differences of Convex Functions , 1985 .

[25]  H. Zou,et al.  The doubly regularized support vector machine , 2006 .

[26]  Marco Sciandrone,et al.  Concave programming for minimizing the zero-norm over polyhedral sets , 2010, Comput. Optim. Appl..

[27]  Michael L. Overton,et al.  Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices , 2015, Math. Program..

[28]  Le Thi Hoai An,et al.  A DC programming approach for feature selection in support vector machines learning , 2008, Adv. Data Anal. Classif..

[29]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[30]  Stéphane Canu,et al.  Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[31]  J. Hiriart-Urruty,et al.  Sensitivity analysis of all eigenvalues of a symmetric matrix , 1995 .

[32]  Martine Labbé,et al.  Lagrangian relaxation for SVM feature selection , 2017, Comput. Oper. Res..

[33]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[34]  Paul J. Goulart,et al.  A novel method for modelling cardinality and rank constraints , 2014, 53rd IEEE Conference on Decision and Control.

[35]  Adil M. Bagirov,et al.  A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes , 2017, J. Glob. Optim..

[36]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..