Model-Consistent Sparse Estimation through the Bootstrap

We consider the least-square linear regression problem with regularization by the $\ell^1$-norm, a problem usually referred to as the Lasso. In this paper, we first present a detailed asymptotic analysis of model consistency of the Lasso in low-dimensional settings. For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection. For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability. We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection. This novel variable selection procedure, referred to as the Bolasso, is extended to high-dimensional settings by a provably consistent two-step procedure.

[1]  Jean Charles Gilbert,et al.  Numerical Optimization: Theoretical and Practical Aspects , 2003 .

[2]  R. DeVore,et al.  Compressed sensing and best k-term approximation , 2008 .

[3]  Peter Buhlmann Boosting for high-dimensional linear models , 2006, math/0606789.

[4]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[5]  Gábor Lugosi,et al.  Concentration Inequalities , 2008, COLT.

[6]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[7]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[8]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[9]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[10]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[11]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[12]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[13]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[14]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[15]  V. Bentkus On the dependence of the Berry–Esseen bound on dimension , 2003 .

[16]  Nicolai Meinshausen,et al.  Relaxed Lasso , 2007, Comput. Stat. Data Anal..

[17]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[18]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[19]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[20]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[21]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[22]  M. Yuan,et al.  On the non‐negative garrotte estimator , 2007 .

[23]  F. T. Wright A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables Whose Distributions are not Necessarily Symmetric , 1973 .

[24]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[25]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[26]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[27]  Karim Lounici Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators , 2008, 0801.4610.

[28]  Laurent El Ghaoui,et al.  An Homotopy Algorithm for the Lasso with Online Observations , 2008, NIPS.

[29]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[30]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[31]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[32]  J. Fox Bootstrapping Regression Models , 2002 .

[33]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[34]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[35]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[36]  Jean-Jacques Fuchs,et al.  On sparse representations in arbitrary redundant bases , 2004, IEEE Transactions on Information Theory.

[37]  Tong Zhang Some sharp performance bounds for least squares regression with L1 regularization , 2009, 0908.2869.

[38]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[39]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[40]  H. Markowitz The optimization of a quadratic function subject to linear constraints , 1956 .

[41]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.