Thresholded Lasso for high dimensional variable selection and statistical estimation

Given $n$ noisy samples with $p$ dimensions, where $n \ll p$, we show that the multi-step thresholding procedure based on the Lasso -- we call it the {\it Thresholded Lasso}, can accurately estimate a sparse vector $\beta \in \R^p$ in a linear model $Y = X \beta + \epsilon$, where $X_{n \times p}$ is a design matrix normalized to have column $\ell_2$ norm $\sqrt{n}$, and $\epsilon \sim N(0, \sigma^2 I_n)$. We show that under the restricted eigenvalue (RE) condition (Bickel-Ritov-Tsybakov 09), it is possible to achieve the $\ell_2$ loss within a logarithmic factor of the ideal mean square error one would achieve with an {\em oracle} while selecting a sufficiently sparse model -- hence achieving {\it sparse oracle inequalities}; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. In some sense, the Thresholded Lasso recovers the choices that would have been made by the $\ell_0$ penalized least squares estimators, in that it selects a sufficiently sparse model without sacrificing the accuracy in estimating $\beta$ and in predicting $X \beta$. We also show for the Gauss-Dantzig selector (Cand\`{e}s-Tao 07), if $X$ obeys a uniform uncertainty principle and if the true parameter is sufficiently sparse, one will achieve the sparse oracle inequalities as above, while allowing at most $s_0$ irrelevant variables in the model in the worst case, where $s_0 \leq s$ is the smallest integer such that for $\lambda = \sqrt{2 \log p/n}$, $\sum_{i=1}^p \min(\beta_i^2, \lambda^2 \sigma^2) \leq s_0 \lambda^2 \sigma^2$. Our simulation results on the Thresholded Lasso match our theoretical analysis excellently.

[1]  Stanislaw J. Szarek,et al.  Condition numbers of random matrices , 1991, J. Complex..

[2]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[3]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  P. Massart,et al.  From Model Selection to Adaptive Estimation , 1997 .

[6]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[7]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[8]  P. Massart,et al.  Gaussian model selection , 2001 .

[9]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[10]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[11]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[12]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[13]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[14]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[15]  S. Mendelson,et al.  Uniform Uncertainty Principle for Bernoulli and Subgaussian Ensembles , 2006, math/0608665.

[16]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[17]  Martin J. Wainwright,et al.  Sharp thresholds for high-dimensional and noisy recovery of sparsity , 2006, ArXiv.

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[20]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[21]  M. Rudelson,et al.  Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[22]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[23]  Florentina Bunea,et al.  Sparse Density Estimation with l1 Penalties , 2007, COLT.

[24]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[25]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[26]  Deanna Needell,et al.  Signal recovery from incomplete and inaccurate measurements via ROMP , 2007 .

[27]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[28]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[29]  Karim Lounici Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators , 2008, 0801.4610.

[30]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[31]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[32]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[33]  V. Koltchinskii The Dantzig selector and sparsity oracle inequalities , 2009, 0909.0861.

[34]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[35]  J. Tropp,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, Commun. ACM.

[36]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[37]  R. Adamczak,et al.  Restricted Isometry Property of Matrices with Independent Columns and Neighborly Polytopes by Random Sampling , 2009, 0904.4723.

[38]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[39]  Martin J. Wainwright,et al.  Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting , 2009, IEEE Trans. Inf. Theory.

[40]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[41]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[42]  Tong Zhang Some sharp performance bounds for least squares regression with L1 regularization , 2009, 0908.2869.

[43]  Shuheng Zhou,et al.  Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation , 2009, NIPS.

[44]  V. Koltchinskii Sparsity in penalized empirical risk minimization , 2009 .

[45]  Shuheng Zhou Restricted Eigenvalue Conditions on Subgaussian Random Matrices , 2009, 0912.4045.

[46]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[47]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[48]  S. Geer,et al.  Adaptive Lasso for High Dimensional Regression and Gaussian Graphical Modeling , 2009, 0903.2515.

[49]  S. Geer,et al.  The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) , 2011 .

[50]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[51]  Lie Wang,et al.  Stable Recovery of Sparse Signals and an Oracle Inequality , 2010, IEEE Transactions on Information Theory.

[52]  Sara van de Geer,et al.  Prediction and variable selection with the adaptive Lasso , 2010 .

[53]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[54]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.