Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming

We propose a pivotal method for estimating high-dimensional sparse linear regression models, where the overall number of regressors p is large, possibly much larger than n, but only s regressors are significant. The method is a modification of the lasso, called the square-root lasso. The method is pivotal in that it neither relies on the knowledge of the standard deviation σ nor does it need to pre-estimate σ. Moreover, the method does not rely on normality or sub-Gaussianity of noise. It achieves near-oracle performance, attaining the convergence rate σl(s/n) log pr-super-1/2 in the prediction norm, and thus matching the performance of the lasso with known σ. These performance results are valid for both Gaussian and non-Gaussian errors, under some mild moment restrictions. We formulate the square-root lasso as a solution to a convex conic programming problem, which allows us to implement the estimator using efficient algorithmic methods, such as interior-point and first-order methods. Copyright 2011, Oxford University Press.

[1]  B. V. Bahr,et al.  Inequalities for the $r$th Absolute Moment of a Sum of Random Variables, $1 \leqq r \leqq 2$ , 1965 .

[2]  H. Rosenthal On the subspaces ofLp(p>2) spanned by sequences of independent random variables , 1970 .

[3]  A. D. Slastnikov,et al.  Limit Theorems for Moderate Deviation Probabilities , 1979 .

[4]  On Large Deviations for Sums of Non-Identically Distributed Random Variables , 1982 .

[5]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[6]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[9]  James Renegar,et al.  A mathematical view of interior-point methods in convex optimization , 2001, MPS-SIAM series on optimization.

[10]  Bing-Yi Jing,et al.  Self-normalized Cramér-type large deviations for independent random variables , 2003 .

[11]  R. Freund Review of A mathematical view of interior-point methods in convex optimization, by James Renegar, SIAM, Philadelphia, PA , 2004 .

[12]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[13]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[14]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[15]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[16]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[17]  M. Rudelson,et al.  On sparse reconstruction from Fourier and Gaussian measurements , 2008 .

[18]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[19]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[20]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[21]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[22]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[23]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[24]  Tong Zhang,et al.  On the Consistency of Feature Selection using Greedy Least Squares Regression , 2009, J. Mach. Learn. Res..

[25]  V. Koltchinskii Sparsity in penalized empirical risk minimization , 2009 .

[26]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[27]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[28]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[29]  Tze Leung Lai,et al.  Self-Normalized Processes , 2009 .

[30]  Guanghui Lan,et al.  Primal-dual first-order methods with O (1/e) iteration-complexity for cone programming. , 2011 .

[31]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[32]  A. Belloni,et al.  Estimation of treatment effects with high-dimensional controls , 2011 .

[33]  A. Belloni,et al.  Inference for High-Dimensional Sparse Econometric Models , 2011, 1201.0220.

[34]  Victor Chernozhukov,et al.  Central limit theorems and multiplier bootstrap when p is much larger than n , 2012 .

[35]  Kim-Chuan Toh,et al.  On the Implementation and Usage of SDPT3 – A Matlab Software Package for Semidefinite-Quadratic-Linear Programming, Version 4.0 , 2012 .

[36]  Victor Chernozhukov,et al.  Pivotal estimation via square-root Lasso in nonparametric regression , 2014 .