Leave-one-out cross-validation is risk consistent for lasso

The lasso procedure pervades the statistical and signal processing literature, and as such, is the target of substantial theoretical and applied research. While much of this research focuses on the desirable properties that lasso possesses—predictive risk consistency, sign consistency, correct model selection—these results assume that the tuning parameter is chosen in an oracle fashion. Yet, this is impossible in practice. Instead, data analysts must use the data twice, once to choose the tuning parameter and again to estimate the model. But only heuristics have ever justified such a procedure. To this end, we give the first definitive answer about the risk consistency of lasso when the smoothing parameter is chosen via cross-validation. We show that under some restrictions on the design matrix, the lasso estimator is still risk consistent with an empirically chosen tuning parameter.

[1]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[2]  W. Newey,et al.  Uniform Convergence in Probability and Stochastic Equicontinuity , 1991 .

[3]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[4]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[5]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[6]  Karl R. Stromberg,et al.  Probability for Analysts , 1994, The Mathematical Gazette.

[7]  Cun-Hui Zhang,et al.  Rate Minimaxity of the Lasso and Dantzig Selector for the lq Loss in lr Balls , 2010, J. Mach. Learn. Res..

[8]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[9]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[10]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .

[11]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[12]  Yves Grandvalet Least Absolute Shrinkage is Equivalent to Quadratic Penalization , 1998 .

[13]  Wei Pan,et al.  Predictor Network in Penalized Regression with Application to Microarray Data” , 2009 .

[14]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[15]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[16]  J. Davidson Stochastic Limit Theory , 1994 .

[17]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[18]  J. Davidson Stochastic Limit Theory: An Introduction for Econometricians , 1994 .

[19]  Grace Wahba,et al.  LASSO-Patternsearch algorithm with application to ophthalmology and genomic data. , 2006, Statistics and its interface.

[20]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[21]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[22]  Cullen Schaffer,et al.  Selecting a classification method by cross-validation , 1993, Machine Learning.

[23]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[24]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[25]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  S. Lahiri,et al.  Strong consistency of Lasso estimators , 2011 .

[28]  Seunghak Lee,et al.  Adaptive Multi-Task Lasso: with Application to eQTL Detection , 2010, NIPS.

[29]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[30]  Shie Mannor,et al.  Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  S. Geer,et al.  The Lasso, correlated design, and improved oracle inequalities , 2011, 1107.0189.

[32]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[33]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[34]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[35]  Shie Mannor,et al.  Sparse algorithms are not stable: A no-free-lunch theorem , 2008, Allerton 2008.

[36]  Cullen Schaffer,et al.  Technical Note: Selecting a Classification Method by Cross-Validation , 1993, Machine Learning.

[37]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[38]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.