A General Lp-norm Support Vector Machine via Mixed 0-1 Programming

Identifying a good feature subset that contributes most to the performance of Lp-norm Support Vector Machines (Lp-SVMs with p=1 or p=2) is an important task. We realize that the Lp-SVMs do not comprehensively consider irrelevant and redundant features, because the Lp-SVMs consider all n full-set features be important for training while skipping other 2n−1 possible feature subsets at the same time. In previous work, we have studied the L1-norm SVM and applied it to the feature selection problem. In this paper, we extend our research to the L2-norm SVM and propose to generalize the Lp-SVMs into one general Lp-norm Support Vector Machine (GLp-SVM) that takes into account all 2n possible feature subsets. We represent the GLp-SVM as a mixed 0-1 nonlinear programming problem (M01NLP). We prove that solving the new proposed M01NLP optimization problem results in a smaller error penalty and enlarges the margin between two support vector hyper-planes, thus possibly giving a better generalization capability of SVMs than solving the traditional Lp-SVMs. Moreover, by following the new formulation we can easily control the sparsity of the GLp-SVM by adding a linear constraint to the proposed M01NLP optimization problem. In order to reduce the computational complexity of directly solving the M01NLP problem, we propose to equivalently transform it into a mixed 0-1 linear programming (M01LP) problem if p=1 or into a mixed 0-1 quadratic programming (M01QP) problem if p=2. The M01LP and M01QP problems are then solved by using the branch and bound algorithm. Experimental results obtained over the UCI, LIBSVM, UNM and MIT Lincoln Lab datasets show that our new proposed GLp-SVM outperforms the traditional Lp-SVMs by improving the classification accuracy by more than 13.49%.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  Ching-Ter Chang,et al.  An efficient linearization approach for mixed-integer problems , 2000, Eur. J. Oper. Res..

[3]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[6]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[7]  Slobodan Petrovic,et al.  On General Definition of L1-norm Support VectorMachines for Feature Selection , 2011 .

[8]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[9]  Ching-Ter Chang,et al.  On the polynomial mixed 0-1 fractional programming problems , 2001, Eur. J. Oper. Res..

[10]  R. Newman Computer Security: Protecting Digital Resources , 2009 .

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[14]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[15]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[16]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[17]  Xiaotong Shen,et al.  On L1-Norm Multiclass Support Vector Machines , 2007 .

[18]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[19]  Olvi L. Mangasarian,et al.  Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[20]  Robert K. Cunningham,et al.  The 1998 DARPA/AFRL Off-line Intrusion Detection Evaluation , 1998 .

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .