On cross-validated Lasso in high dimensions

In this paper, we derive non-asymptotic error bounds for the Lasso estimator when the penalty parameter for the estimator is chosen using $K$-fold cross-validation. Our bounds imply that the cross-validated Lasso estimator has nearly optimal rates of convergence in the prediction, $L^2$, and $L^1$ norms. For example, we show that in the model with the Gaussian noise and under fairly general assumptions on the candidate set of values of the penalty parameter, the estimation error of the cross-validated Lasso estimator converges to zero in the prediction norm with the $\sqrt{s\log p / n}\times \sqrt{\log(p n)}$ rate, where $n$ is the sample size of available data, $p$ is the number of covariates, and $s$ is the number of non-zero coefficients in the model. Thus, the cross-validated Lasso estimator achieves the fastest possible rate of convergence in the prediction norm up to a small logarithmic factor $\sqrt{\log(p n)}$, and similar conclusions apply for the convergence rate both in $L^2$ and in $L^1$ norms. Importantly, our results cover the case when $p$ is (potentially much) larger than $n$ and also allow for the case of non-Gaussian noise. Our paper therefore serves as a justification for the widely spread practice of using cross-validation as a method to choose the penalty parameter for the Lasso estimator.

[1]  Andrea Montanari,et al.  The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning , 2018, The Annals of Statistics.

[2]  Daniel J. McDonald,et al.  Leave-one-out cross-validation is risk consistent for lasso , 2012, Machine Learning.

[3]  Christian Hansen,et al.  High-dimensional econometrics and regularized GMM , 2018, 1806.01888.

[4]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[5]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[6]  Jafar Jafarov,et al.  Prediction error of cross-validated Lasso , 2015, 1502.06291.

[7]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[8]  Kengo Kato,et al.  Some new asymptotic theory for least squares series: Pointwise and uniform results , 2012, 1212.0442.

[9]  C. Giraud Introduction to High-Dimensional Statistics , 2014 .

[10]  R. Adler,et al.  Random Fields and Geometry , 2007 .

[11]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[12]  Cun-Hui Zhang,et al.  Second-order Stein: SURE for SURE and other applications in high-dimensional inference , 2018, The Annals of Statistics.

[13]  D. Chetverikov,et al.  Nonparametric Instrumental Variable Estimation Under Monotonicity , 2015, 1507.05270.

[14]  M. Rudelson,et al.  On sparse reconstruction from Fourier and Gaussian measurements , 2008 .

[15]  Cun-Hui Zhang,et al.  Sparse matrix inversion with scaled Lasso , 2012, J. Mach. Learn. Res..

[16]  Daniel J. McDonald,et al.  Risk-consistency of cross-validation with lasso-type procedures , 2013, 1308.0810.

[17]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[18]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[19]  Victor Chernozhukov,et al.  High Dimensional Sparse Econometric Models: An Introduction , 2011, 1106.5242.

[20]  Sara van de Geer,et al.  Ecole d'été de probabilités de Saint-Flour XLV , 2016 .

[21]  Alberto Abadie,et al.  Choosing among regularized estimators in empirical economics , 2018 .

[22]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2012, 1212.6906.

[23]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[24]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[25]  M. Talagrand Mean Field Models for Spin Glasses , 2011 .

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  Sendhil Mullainathan,et al.  Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[28]  Louis H. Y. Chen,et al.  Normal Approximation by Stein's Method , 2010 .

[29]  Daniel J. McDonald,et al.  The lasso, persistence, and cross-validation , 2013, ICML.

[30]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[31]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[32]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[33]  S. Chatterjee A new perspective on least squares under convex constraint , 2014, 1402.0830.

[34]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[35]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[36]  X. Sala-i-Martin,et al.  I Just Ran Two Million Regressions , 1997 .

[37]  SUPPLEMENT TO “ ON CROSS-VALIDATED LASSO IN HIGH DIMENSIONS , 2020 .

[38]  P. Bellec The noise barrier and the large signal bias of the Lasso and other convex estimators , 2018, 1804.01230.

[39]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[40]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[41]  T. Tao Topics in Random Matrix Theory , 2012 .

[42]  S. Chatterjee High dimensional regression and matrix estimation without tuning parameters , 2015, 1510.07294.

[43]  Yuhong Yang MODEL SELECTION FOR NONPARAMETRIC REGRESSION , 1997 .

[44]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[45]  Guillaume Lecué,et al.  Oracle inequalities for cross-validation type procedures , 2012 .

[46]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[47]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[48]  Michael Luca,et al.  Supplemental Appendix for : Productivity and Selection of Human Capital with Machine Learning , 2016 .