Pivotal estimation via square-root Lasso in nonparametric regression

We propose a self-tuning $\sqrt{\mathrm {Lasso}} $ method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme cases, such as the infinite variance case and the noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds for $\sqrt{\mathrm {Lasso}} $ including prediction norm rate and sparsity. Our analysis is based on new impact factors that are tailored for bounding prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely on moderate deviation theory for self-normalized sums to achieve Gaussian-like results under weak conditions. Moreover, we derive bounds on the performance of ordinary least square (ols) applied to the model selected by $\sqrt{\mathrm {Lasso}} $ accounting for possible misspecification of the selected model. Under mild conditions, the rate of convergence of ols post $\sqrt{\mathrm {Lasso}} $ is as good as $\sqrt{\mathrm {Lasso}} $’s rate. As an application, we consider the use of $\sqrt{\mathrm {Lasso}} $ and ols post $\sqrt{\mathrm {Lasso}} $ as estimators of nuisance parameters in a generic semiparametric problem (nonlinear moment condition or $Z$-problem), resulting in a construction of $\sqrt{n}$-consistent and asymptotically normal estimators of the main parameters.

[1]  Karim Lounici Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators , 2008, 0801.4610.

[2]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[3]  J WainwrightMartin Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso) , 2009 .

[4]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[5]  S. Geer,et al.  Rejoinder: ℓ1-penalization for mixture regression models , 2010 .

[6]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[7]  V. Koltchinskii Sparsity in penalized empirical risk minimization , 2009 .

[8]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[9]  Stéphane Chrétien,et al.  Sparse Recovery With Unknown Variance: A LASSO-Type Approach , 2011, IEEE Transactions on Information Theory.

[10]  B. V. Bahr,et al.  Inequalities for the $r$th Absolute Moment of a Sum of Random Variables, $1 \leqq r \leqq 2$ , 1965 .

[11]  WangLie The L1 penalized LAD estimator for high dimensional linear regression , 2013 .

[12]  Florentina Bunea,et al.  Aggregation and sparsity via 1 penalized least squares , 2006 .

[13]  Lutz Dümbgen,et al.  Nemirovski's Inequalities Revisited , 2008, Am. Math. Mon..

[14]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[15]  Victor Chernozhukov,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011 .

[16]  Sylvie Huet,et al.  High-dimensional regression with unknown variance , 2011, 1109.5587.

[17]  Christian Hansen,et al.  Lasso Methods for Gaussian Instrumental Variables Models , 2010, 1012.1297.

[18]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[19]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[20]  Harrison H. Zhou,et al.  Asymptotic normality and optimalities in estimation of large Gaussian graphical models , 2013, 1309.6024.

[21]  Takeshi Amemiya,et al.  The Maximum Likelihood and the Nonlinear Three-Stage Least Squares Estimator in the General Nonlinear Simultaneous Equation Model , 1977 .

[22]  Gary Chamberlain,et al.  Efficiency Bounds for Semiparametric Regression , 1992 .

[23]  Yin Chen,et al.  Fused sparsity and robust estimation for linear models with unknown variance , 2012, NIPS.

[24]  Bing-Yi Jing,et al.  Self-normalized Cramér-type large deviations for independent random variables , 2003 .

[25]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[26]  S. Gaiffas,et al.  High Dimensional Matrix Estimation With Unknown Variance Of The Noise , 2011, 1112.3055.

[27]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2013 .

[28]  Cun-Hui Zhang,et al.  Confidence Intervals for Low-Dimensional Parameters With High-Dimensional Data , 2011 .

[29]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[30]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[31]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[32]  James Renegar,et al.  A mathematical view of interior-point methods in convex optimization , 2001, MPS-SIAM series on optimization.

[33]  B. M. Pötscher,et al.  CAN ONE ESTIMATE THE UNCONDITIONAL DISTRIBUTION OF POST-MODEL-SELECTION ESTIMATORS? , 2007, Econometric Theory.

[34]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[35]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[36]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[37]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[38]  A. V. D. Vaart,et al.  Asymptotic Statistics: U -Statistics , 1998 .

[39]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[40]  Zhaosong Lu Gradient based method for cone programming with application to large-scale compressed sensing , 2008 .

[41]  Kengo Kato,et al.  Uniform post selection inference for LAD regression models , 2013 .

[42]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[43]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[44]  Kengo Kato,et al.  Uniform post selection inference for LAD regression and other z-estimation problems , 2013 .

[45]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2013, 1309.4686.

[46]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[47]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[48]  Tze Leung Lai,et al.  Self-Normalized Processes , 2009 .

[49]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[50]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[51]  A. Belloni,et al.  Program evaluation with high-dimensional data , 2013 .

[52]  A. Tsybakov,et al.  Sparse recovery under matrix uncertainty , 2008, 0812.2818.

[53]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[54]  A. Tsybakov,et al.  High-dimensional instrumental variables regression and confidence sets -- v2/2012 , 2018, 1812.11330.

[55]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[56]  H. Rosenthal On the subspaces ofLp(p>2) spanned by sequences of independent random variables , 1970 .

[57]  A. Belloni,et al.  Inference for High-Dimensional Sparse Econometric Models , 2011, 1201.0220.

[58]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[59]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[60]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[61]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[62]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[63]  A. Belloni,et al.  Pivotal estimation via square-root Lasso in nonparametric regression , 2011, 1105.1475.

[64]  Victor Chernozhukov,et al.  High Dimensional Sparse Econometric Models: An Introduction , 2011, 1106.5242.

[65]  Kengo Kato,et al.  Gaussian approximation of suprema of empirical processes , 2012, 1212.6885.

[66]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[67]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[68]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[69]  Gebräuchliche Fertigarzneimittel,et al.  V , 1893, Therapielexikon Neurologie.