Efficient Lasso training from a geometrical perspective

The Lasso (L1-penalized regression) has drawn great interests in machine learning and statistics due to its robustness and high accuracy. A variety of methods have been proposed for solving the Lasso. But for large scale problems, the presence of L1 norm constraint significantly impedes the efficiency. Inspired by recent theoretical and practical contributions on the close relation between Lasso and SVMs, we reformulate the Lasso as a problem of finding the nearest point in a polytope to the origin, which circumvents the L1 norm constraint. This problem can be solved efficiently from a geometric perspective using the Wolfe's method. Comparing with least angle regression (LARS), which is a conventional method to solve Lasso, the proposed algorithm is advantageous in both efficiency and numerical stability. Experimental results show that the proposed approach is competitive with other state-of-the-art Lasso solvers on large scale problems.

[1]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[2]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[6]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[7]  Kilian Q. Weinberger,et al.  Gradient boosted feature selection , 2014, KDD.

[8]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[9]  Kim-Chuan Toh,et al.  A coordinate gradient descent method for ℓ1-regularized convex minimization , 2011, Comput. Optim. Appl..

[10]  Martin Jaggi,et al.  An Equivalence between the Lasso and Support Vector Machines , 2013, ArXiv.

[11]  Kilian Q. Weinberger,et al.  The Greedy Miser: Learning under Test-time Budgets , 2012, ICML.

[12]  Yixin Chen,et al.  A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing , 2014, AAAI.

[13]  Yixin Chen,et al.  Feature-Cost Sensitive Learning with Submodular Trees of Classifiers , 2014, AAAI.

[14]  Philip Wolfe,et al.  Finding the nearest point in A polytope , 1976, Math. Program..

[15]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[16]  Cheng Wu,et al.  Kernelized LARS–LASSO for constructing radial basis function neural networks , 2012, Neural Computing and Applications.

[17]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .