Comparison of $\ell _{1}$ -Norm SVR and Sparse Coding Algorithms for Linear Regression

Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.

[1]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[2]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[3]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[4]  Allen Y. Yang,et al.  Fast ℓ1-minimization algorithms and an application in robust face recognition: A review , 2010, 2010 IEEE International Conference on Image Processing.

[5]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[6]  Dmitry M. Malioutov,et al.  Homotopy continuation for sparse signal representation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[8]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[9]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[10]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[11]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[12]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[13]  Sheng Chen,et al.  Model selection approaches for non-linear system identification: a review , 2008, Int. J. Syst. Sci..

[14]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[15]  Min Han,et al.  The hidden neurons selection of the wavelet networks using support vector machines and ridge regression , 2008, Neurocomputing.

[16]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[17]  De-Shuang Huang,et al.  A Hybrid Forward Algorithm for RBF Neural Network Construction , 2006, IEEE Transactions on Neural Networks.

[18]  M. Kojima,et al.  A primal-dual interior point algorithm for linear programming , 1988 .

[19]  Bernhard Schölkopf,et al.  Incorporating Invariances in Support Vector Learning Machines , 1996, ICANN.

[20]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[21]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[22]  J. Weston,et al.  Support vector regression with ANOVA decomposition kernels , 1999 .

[23]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[24]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[25]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[26]  Martin D. Buhmann,et al.  Radial Basis Functions: Theory and Implementations: Preface , 2003 .

[27]  Olvi L. Mangasarian,et al.  Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[28]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[29]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[30]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[31]  Junfeng Yang,et al.  Alternating Direction Algorithms for 1-Problems in Compressive Sensing , 2009, SIAM J. Sci. Comput..

[32]  V. Vapnik,et al.  A note one class of perceptrons , 1964 .