Strong rules for discarding predictors in lasso‐type problems

Summary.  We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui and his colleagues have proposed ‘SAFE’ rules, based on univariate inner products between each predictor and the outcome, which guarantee that a coefficient will be 0 in the solution vector. This provides a reduction in the number of variables that need to be entered into the optimization. We propose strong rules that are very simple and yet screen out far more predictors than the SAFE rules. This great practical improvement comes at a price: the strong rules are not foolproof and can mistakenly discard active predictors, i.e. predictors that have non‐zero coefficients in the solution. We therefore combine them with simple checks of the Karush–Kuhn–Tucker conditions to ensure that the exact solution to the convex problem is delivered. Of course, any (approximate) screening method can be combined with the Karush–Kuhn–Tucker conditions to ensure the exact solution; the strength of the strong rules lies in the fact that, in practice, they discard a very large number of the inactive predictors and almost never commit mistakes. We also derive conditions under which they are foolproof. Strong rules provide substantial savings in computational time for a variety of statistical optimization problems.

[1]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Questions from the audience , 1999, European journal of dermatology : EJD.

[4]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[5]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[6]  Jean-Jacques Fuchs,et al.  Recovery of exact sparse representations in the presence of noise , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Jean-Jacques Fuchs,et al.  Recovery of exact sparse representations in the presence of bounded noise , 2005, IEEE Transactions on Information Theory.

[8]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[9]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[10]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[11]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[12]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[13]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[14]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[15]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[16]  C. Robert Discussion of "Sure independence screening for ultra-high dimensional feature space" by Fan and Lv. , 2008 .

[17]  Jeffrey S. Morris,et al.  Sure independence screening for ultrahigh dimensional feature space Discussion , 2008 .

[18]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[19]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[20]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[21]  Laurent El Ghaoui,et al.  Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems , 2010, 1009.4219.

[22]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[23]  J. Friedman,et al.  New Insights and Faster Computations for the Graphical Lasso , 2011 .

[24]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[25]  Trevor J. Hastie,et al.  Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso , 2011, J. Mach. Learn. Res..