Double regularization methods for robust feature selection and SVM classification via DC programming

Abstract In this work, two novel formulations for embedded feature selection are presented. A second-order cone programming approach for Support Vector Machines is extended by adding a second regularizer to encourage feature elimination. The one- and the zero-norm penalties are used in combination with the Tikhonov regularization under a robust setting designed to correctly classify instances, up to a predefined error rate, even for the worst data distribution. The use of the zero norm leads to a nonconvex formulation, which is solved by using Difference of Convex (DC) functions, extending DC programming to second-order cones. Experiments on high-dimensional microarray datasets were performed, and the best performance was obtained with our approaches compared with well-known feature selection methods for Support Vector Machines.

[1]  Julio López,et al.  A multi-class SVM approach based on the l1-norm minimization of the distances between the reduced convex hulls , 2015, Pattern Recognit..

[2]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Chiranjib Bhattacharyya,et al.  Second Order Cone Programming Formulations for Feature Selection , 2004, J. Mach. Learn. Res..

[5]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[6]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[9]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[10]  Bekhelifa Leila,et al.  Determination of selected cetyltrimethylammonium halide parameters by molecular modeling. Study of their adsorption on montmorillonite , 2014, Journal of Cheminformatics.

[11]  Héctor Ramírez Cabrera,et al.  Interior proximal algorithm with variable metric for second-order cone programming: applications to structural optimization and support vector machines , 2010, Optim. Methods Softw..

[12]  Julio López,et al.  Multi-class second-order cone programming support vector machines , 2016, Inf. Sci..

[13]  Donald Goldfarb,et al.  Second-order cone programming , 2003, Math. Program..

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Julio López,et al.  An embedded feature selection approach for support vector classification via second-order cone programming , 2015, Intell. Data Anal..

[16]  Ljubomir J. Buturovic,et al.  Cross-validation pitfalls when selecting and assessing regression and classification models , 2014, Journal of Cheminformatics.

[17]  Richard Weber,et al.  Simultaneous feature selection and classification using kernel-penalized support vector machines , 2011, Inf. Sci..

[18]  Le Thi Hoai An,et al.  Numerical solution for optimization over the efficient set by d.c. optimization algorithms , 1996, Oper. Res. Lett..

[19]  R. Rockafellar Convex Analysis: (pms-28) , 1970 .

[20]  Le Thi Hoai An,et al.  A D.C. Optimization Algorithm for Solving the Trust-Region Subproblem , 1998, SIAM J. Optim..

[21]  El Bernoussi Souad,et al.  Algorithms for Solving a Class of Nonconvex Optimization Problems. Methods of Subgradients , 1986 .

[22]  Le Thi Hoai An,et al.  Solving a Class of Linearly Constrained Indefinite Quadratic Problems by D.C. Algorithms , 1997, J. Glob. Optim..

[23]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[25]  Chiranjib Bhattacharyya,et al.  Maximum Margin Classifiers with Specified False Positive and False Negative Error Rates , 2007, SDM.

[26]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[27]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[28]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[29]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[30]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[31]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Le Thi Hoai An,et al.  The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems , 2005, Ann. Oper. Res..

[33]  Anne Vincent-Salomon,et al.  A prognostic DNA signature for T1T2 node‐negative breast cancer patients , 2010, Genes, chromosomes & cancer.

[34]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[35]  Juan Romo,et al.  Interpretable support vector machines for functional data , 2014, Eur. J. Oper. Res..

[36]  Marco Sciandrone,et al.  Concave programming for minimizing the zero-norm over polyhedral sets , 2010, Comput. Optim. Appl..

[37]  Julio López,et al.  Robust feature selection for multiclass Support Vector Machines using second-order cone programming , 2015, Intell. Data Anal..

[38]  Julio López,et al.  Alternative second-order cone programming formulations for support vector classification , 2014, Inf. Sci..