Mind the duality gap: safer rules for the Lasso

Screening rules allow to early discard irrelevant variables from the optimization in Lasso problems, or its derivatives, making solvers faster. In this paper, we propose new versions of the so-called $\textit{safe rules}$ for the Lasso. Based on duality gap considerations, our new rules create safe test regions whose diameters converge to zero, provided that one relies on a converging solver. This property helps screening out more variables, for a wider range of regularization parameter values. In addition to faster convergence, we prove that we correctly identify the active sets (supports) of the solutions in finite time. While our proposed strategy can cope with any solver, its performance is demonstrated using a coordinate descent algorithm particularly adapted to machine learning use cases. Significant computing time reductions are obtained with respect to previous safe rules.

[1]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[2]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[3]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[4]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[5]  Jean-Philippe Vert,et al.  TIGRESS: Trustful Inference of Gene REgulation using Stability Selection , 2012, BMC Systems Biology.

[6]  Peter J. Ramadge,et al.  Screening Tests for Lasso Problems , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[8]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[9]  Pingmei Xu,et al.  Three structural results on the lasso problem , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[13]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[14]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[15]  Rémi Gribonval,et al.  Dynamic Screening: Accelerating First-Order Algorithms for the Lasso and Group-Lasso , 2014, IEEE Transactions on Signal Processing.

[16]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[17]  Peter J. Ramadge,et al.  Fast lasso screening tests based on correlations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[19]  Hao Xu,et al.  Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries , 2011, NIPS.

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[22]  Gaël Varoquaux,et al.  Small-sample brain mapping: sparse recovery on spatially correlated designs with randomization and clustering , 2012, ICML.

[23]  J. Mairal Sparse coding for machine learning, image processing and computer vision , 2010 .

[24]  Julien Mairal,et al.  Complexity Analysis of the Lasso Regularization Path , 2012, ICML.

[25]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[26]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[29]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[30]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[31]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[32]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[33]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[34]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[35]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[36]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[37]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[38]  Mohamed-Jalal Fadili,et al.  Local Linear Convergence of Forward-Backward under Partial Smoothness , 2014, NIPS.

[39]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[40]  A. Gramfort,et al.  Mixed-norm estimates for the M/EEG inverse problem using accelerated gradient methods , 2012, Physics in medicine and biology.

[41]  Rémi Gribonval,et al.  A dynamic screening principle for the Lasso , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).