Pivotal estimation via square-root Lasso in nonparametric regression

We propose a self-tuning $\sqrt{\mathrm {Lasso}}$ method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme cases, such as the infinite variance case and the noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds for $\sqrt{\mathrm {Lasso}}$ including prediction norm rate and sparsity. Our analysis is based on new impact factors that are tailored for bounding prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely on moderate deviation theory for self-normalized sums to achieve Gaussian-like results under weak conditions. Moreover, we derive bounds on the performance of ordinary least square (ols) applied to the model selected by $\sqrt{\mathrm {Lasso}}$ accounting for possible misspecification of the selected model. Under mild conditions, the rate of convergence of ols post $\sqrt{\mathrm {Lasso}}$ is as good as $\sqrt{\mathrm {Lasso}}$'s rate. As an application, we consider the use of $\sqrt{\mathrm {Lasso}}$ and ols post $\sqrt{\mathrm {Lasso}}$ as estimators of nuisance parameters in a generic semiparametric problem (nonlinear moment condition or $Z$-problem), resulting in a construction of $\sqrt{n}$-consistent and asymptotically normal estimators of the main parameters.

[1]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[2]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[3]  Takeshi Amemiya,et al.  The Maximum Likelihood and the Nonlinear Three-Stage Least Squares Estimator in the General Nonlinear Simultaneous Equation Model , 1977 .

[4]  A. D. Slastnikov,et al.  Limit Theorems for Moderate Deviation Probabilities , 1979 .

[5]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[6]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[7]  Gary Chamberlain,et al.  Efficiency Bounds for Semiparametric Regression , 1992 .

[8]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[9]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[10]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Kim-Chuan Toh,et al.  SDPT3 — a Matlab software package for semidefinite-quadratic-linear programming, version 3.0 , 2001 .

[13]  James Renegar,et al.  A mathematical view of interior-point methods in convex optimization , 2001, MPS-SIAM series on optimization.

[14]  Bing-Yi Jing,et al.  Self-normalized Cramér-type large deviations for independent random variables , 2003 .

[15]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[16]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[17]  Florentina Bunea,et al.  Aggregation and sparsity via 1 penalized least squares , 2006 .

[18]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[19]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[20]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[21]  B. M. Pötscher,et al.  CAN ONE ESTIMATE THE UNCONDITIONAL DISTRIBUTION OF POST-MODEL-SELECTION ESTIMATORS? , 2007, Econometric Theory.

[22]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[23]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[24]  Karim Lounici Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators , 2008, 0801.4610.

[25]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[26]  Zhaosong Lu Gradient based method for cone programming with application to large-scale compressed sensing , 2008 .

[27]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[28]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[29]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[30]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[31]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[32]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[33]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[34]  V. Koltchinskii Sparsity in penalized empirical risk minimization , 2009 .

[35]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[36]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[37]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[38]  Tze Leung Lai,et al.  Self-Normalized Processes , 2009 .

[39]  A. Belloni,et al.  L1-Penalised quantile regression in high-dimensional sparse models , 2009 .

[40]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[41]  Lutz Dümbgen,et al.  Nemirovski's Inequalities Revisited , 2008, Am. Math. Mon..

[42]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[43]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[44]  A. Belloni,et al.  Post-l1-penalized estimators in high-dimensional linear regression models , 2010 .

[45]  A. Tsybakov,et al.  Sparse recovery under matrix uncertainty , 2008, 0812.2818.

[46]  Victor Chernozhukov,et al.  High Dimensional Sparse Econometric Models: An Introduction , 2011, 1106.5242.

[47]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[48]  Victor Chernozhukov,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011 .

[49]  Sylvie Huet,et al.  High-dimensional regression with unknown variance , 2011, 1109.5587.

[50]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[51]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[52]  A. Belloni,et al.  Inference for High-Dimensional Sparse Econometric Models , 2011, 1201.0220.

[53]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[54]  Cun-Hui Zhang,et al.  Confidence Intervals for Low-Dimensional Parameters With High-Dimensional Data , 2011 .

[55]  Christian Hansen,et al.  Lasso Methods for Gaussian Instrumental Variables Models , 2010, 1012.1297.

[56]  A. Tsybakov,et al.  High-dimensional instrumental variables regression and confidence sets -- v2/2012 , 2018, 1812.11330.

[57]  S. Gaiffas,et al.  High Dimensional Matrix Estimation With Unknown Variance Of The Noise , 2011, 1112.3055.

[58]  Kengo Kato,et al.  Gaussian approximation of suprema of empirical processes , 2012, 1212.6885.

[59]  Yin Chen,et al.  Fused sparsity and robust estimation for linear models with unknown variance , 2012, NIPS.

[60]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2012, 1212.6906.

[61]  Kengo Kato,et al.  Uniform post selection inference for LAD regression and other z-estimation problems , 2013 .

[62]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2013, 1309.4686.

[63]  Lie Wang The L1L1 penalized LAD estimator for high dimensional linear regression , 2013, J. Multivar. Anal..

[64]  A. Belloni,et al.  HONEST CONFIDENCE REGIONS FOR A REGRESSION PARAMETER IN LOGISTIC REGRESSION WITH A LARGE NUMBER OF CONTROLS , 2013 .

[65]  A. Belloni,et al.  Program evaluation with high-dimensional data , 2013 .

[66]  Kengo Kato,et al.  Uniform post selection inference for LAD regression models , 2013 .

[67]  Victor Chernozhukov,et al.  Pivotal estimation via square-root Lasso in nonparametric regression , 2014 .