On the sparseness of 1-norm support vector machines

There is some empirical evidence available showing that 1-norm Support Vector Machines (1-norm SVMs) have good sparseness; however, both how good sparseness 1-norm SVMs can reach and whether they have a sparser representation than that of standard SVMs are not clear. In this paper we take into account the sparseness of 1-norm SVMs. Two upper bounds on the number of nonzero coefficients in the decision function of 1-norm SVMs are presented. First, the number of nonzero coefficients in 1-norm SVMs is at most equal to the number of only the exact support vectors lying on the +1 and -1 discriminating surfaces, while that in standard SVMs is equal to the number of support vectors, which implies that 1-norm SVMs have better sparseness than that of standard SVMs. Second, the number of nonzero coefficients is at most equal to the rank of the sample matrix. A brief review of the geometry of linear programming and the primal steepest edge pricing simplex method are given, which allows us to provide the proof of the two upper bounds and evaluate their tightness by experiments. Experimental results on toy data sets and the UCI data sets illustrate our analysis.

[1]  Tom Downs,et al.  Exact Simplification of Support Vector Solutions , 2002, J. Mach. Learn. Res..

[2]  Kristin P. Bennett,et al.  Combining support vector and mathematical programming methods for classification , 1999 .

[3]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[4]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[5]  Shu-Cherng Fang,et al.  Linear Optimization and Extensions: Theory and Algorithms , 1993 .

[6]  O. Chapelle Multi-Class Feature Selection with Support Vector Machines , 2008 .

[7]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[8]  Li Zhang,et al.  Linear programming support vector machines , 2002, Pattern Recognit..

[9]  Olvi L. Mangasarian,et al.  Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[10]  Yixin Chen,et al.  A sparse support vector machine approach to region-based image categorization , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Ronald A. DeVore,et al.  Deterministic constructions of compressed sensing matrices , 2007, J. Complex..

[12]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[13]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[14]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[15]  Li Zhang,et al.  Hidden Space Principal Component Analysis , 2006, PAKDD.

[16]  John Shawe-Taylor,et al.  Generalisation Error Bounds for Sparse Linear Classifiers , 2000, COLT.

[17]  Min Han,et al.  The hidden neurons selection of the wavelet networks using support vector machines and ridge regression , 2008, Neurocomputing.

[18]  R. DeVore,et al.  Compressed sensing and best k-term approximation , 2008 .

[19]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[20]  Robert J. Vanderbei,et al.  Linear Programming: Foundations and Extensions , 1998, Kluwer international series in operations research and management service.

[21]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[22]  A. Barron,et al.  Approximation and learning by greedy algorithms , 2008, 0803.1718.

[23]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[24]  L. Lan,et al.  Two sparsity-controlled schemes for 1-norm support vector classification , 2007, 2007 IEEE Northeast Workshop on Circuits and Systems.

[25]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[26]  Tomaso Poggio,et al.  A Unified Framework for Regularization Networks and Support Vector Machines , 1999 .

[27]  David G. Stork,et al.  Pattern Classification , 1973 .

[28]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  Xiaotong Shen,et al.  On L1-Norm Multiclass Support Vector Machines , 2007 .

[31]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[32]  Shigeo Abe,et al.  Decomposition techniques for training linear programming support vector machines , 2009, Neurocomputing.

[33]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[34]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[35]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[36]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[37]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[38]  Achim Koberstein,et al.  Progress in the dual simplex algorithm for solving large scale LP problems: techniques for a fast and stable implementation , 2008, Comput. Optim. Appl..

[39]  Donald Goldfarb,et al.  Steepest-edge simplex algorithms for linear programming , 1992, Math. Program..

[40]  Hui Zou An Improved 1-norm SVM for Simultaneous Classification and Variable Selection , 2007, AISTATS.

[41]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[42]  Olvi L. Mangasarian,et al.  Generalized Support Vector Machines , 1998 .

[43]  Uwe H. Suhl,et al.  Progress in the Dual Simplex Algorithm for Solving Large Scale LP Problems: Practical Dual Phase 1 Algorithms , 2007 .

[44]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[45]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[46]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[47]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[48]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[49]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[50]  Li Zhang,et al.  Hidden space support vector machines , 2004, IEEE Transactions on Neural Networks.

[51]  Hui Zou,et al.  Structured variable selection in support vector machines , 2007, 0710.0508.

[52]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.