论文信息 - Feature selection in SVM via polyhedral k-norm

Feature selection in SVM via polyhedral k-norm

We treat the Feature Selection problem in the Support Vector Machine (SVM) framework by adopting an optimization model based on use of the L-0 pseudo-norm. The objective is to control the number of non-zero components of normal vector to the separating hyperplane, while maintaining satisfactory classification accuracy. In our model the polyhedral k-norm , intermediate between L-1 and L-∞ norms, plays a significant role, allowing us to come out with a DC (Difference of Convex) optimization problem that is tackled by means of DCA algorithm. The results of several numerical experiments on benchmark classification datasets are reported.

[1] Martin J. Wainwright,et al. Sparse learning via Boolean relaxations , 2015, Mathematical Programming.

[2] Alexander S. Strekalovsky. Global Optimality Conditions for Nonconvex Optimization , 1998, J. Glob. Optim..

[3] Giovanna Miglionico,et al. Minimizing Piecewise-Concave Functions Over Polyhedra , 2018, Math. Oper. Res..

[4] Richard Weber,et al. Feature selection for Support Vector Machines via Mixed Integer Linear Programming , 2014, Inf. Sci..

[5] Akiko Takeda,et al. DC formulations and algorithms for sparse optimization problems , 2017, Mathematical Programming.

[6] L. Grippo,et al. Exact penalty functions in constrained optimization , 1989 .

[7] Olvi L. Mangasarian,et al. Nonlinear Programming , 1969 .

[8] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[9] Giovanni Felici,et al. Integer programming models for feature selection: New extensions and a randomized solution algorithm , 2016, Eur. J. Oper. Res..

[10] Trevor Hastie,et al. Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[11] Paul S. Bradley,et al. Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[12] Stephen J. Wright. Accelerated Block-coordinate Relaxation for Regularized Optimization , 2012, SIAM J. Optim..

[13] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14] G. Alistair Watson. Linear best approximation using a class of polyhedral norms , 2005, Numerical Algorithms.

[15] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[16] Carla E. Brodley,et al. Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[17] Adil M. Bagirov,et al. Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations , 2018, J. Glob. Optim..

[18] Paul S. Bradley,et al. Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[19] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[20] Le Thi Hoai An,et al. The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems , 2005, Ann. Oper. Res..

[21] Kim-Chuan Toh,et al. On the Moreau-Yosida Regularization of the Vector k-Norm Related Functions , 2014, SIAM J. Optim..

[22] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[23] Gilles Aubert,et al. A Unified View of Exact Continuous Penalties for ℓ2-ℓ0 Minimization , 2017, SIAM J. Optim..

[24] J. Hiriart-Urruty. Generalized Differentiability / Duality and Optimization for Problems Dealing with Differences of Convex Functions , 1985 .

[25] H. Zou,et al. The doubly regularized support vector machine , 2006 .

[26] Marco Sciandrone,et al. Concave programming for minimizing the zero-norm over polyhedral sets , 2010, Comput. Optim. Appl..

[27] Michael L. Overton,et al. Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices , 2015, Math. Program..

[28] Le Thi Hoai An,et al. A DC programming approach for feature selection in support vector machines learning , 2008, Adv. Data Anal. Classif..

[29] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[30] Stéphane Canu,et al. Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[31] J. Hiriart-Urruty,et al. Sensitivity analysis of all eigenvalues of a symmetric matrix , 1995 .

[32] Martine Labbé,et al. Lagrangian relaxation for SVM feature selection , 2017, Comput. Oper. Res..

[33] Bernhard Schölkopf,et al. Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[34] Paul J. Goulart,et al. A novel method for modelling cardinality and rank constraints , 2014, 53rd IEEE Conference on Decision and Control.

[35] Adil M. Bagirov,et al. A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes , 2017, J. Glob. Optim..

[36] Edoardo Amaldi,et al. On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..