A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees

We introduce a novel scheme for choosing the regularization parameter in high-dimensional linear regression with Lasso. This scheme, inspired by Lepski's method for bandwidth selection in non-parametric regression, is equipped with both optimal finite-sample guarantees and a fast algorithm. In particular, for any design matrix such that the Lasso has low sup-norm error under an "oracle choice" of the regularization parameter, we show that our method matches the oracle performance up to a small constant factor, and show that it can be implemented by performing simple tests along a single Lasso path. By applying the Lasso to simulated and real data, we find that our novel scheme can be faster and more accurate than standard schemes such as Cross-Validation.

[1]  Po-Ling Loh,et al.  Support recovery without incoherence: A case for nonconvex regularization , 2014, ArXiv.

[2]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[3]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[4]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[5]  Karim Lounici Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators , 2008, 0801.4610.

[6]  Sylvie Huet,et al.  High-dimensional regression with unknown variance , 2011, 1109.5587.

[7]  A. Antoniadis Comments on: ℓ1-penalization for mixture regression models , 2010 .

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  Christian L. Müller,et al.  Don't Fall for Tuning Parameters: Tuning-Free Variable Selection in High Dimensions With the TREX , 2014, AAAI.

[10]  O. Lepskii On a Problem of Adaptive Estimation in Gaussian White Noise , 1991 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Sara A. van de Geer,et al.  Worst possible sub-directions in high-dimensional models , 2014, J. Multivar. Anal..

[13]  William Valdar,et al.  A permutation approach for selecting the penalty parameter in penalized model selection , 2014, Biometrics.

[14]  Joseph Salmon,et al.  Optimal two-step prediction in regression , 2014, 1410.5014.

[15]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[16]  Sylvie Huet,et al.  Estimator selection in the Gaussian setting , 2010, 1007.2096.

[17]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[18]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[19]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[20]  Cun-Hui Zhang,et al.  Rate Minimaxity of the Lasso and Dantzig Selector for the lq Loss in lr Balls , 2010, J. Mach. Learn. Res..

[21]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[22]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[23]  Johannes Lederer,et al.  A robust, adaptive M-estimator for pointwise estimation in heteroscedastic regression , 2012, 1207.4447.

[24]  F. Bunea Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization , 2008, 0808.4051.

[25]  Peter Bühlmann,et al.  High-Dimensional Statistics with a View Toward Applications in Biology , 2014 .

[26]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[27]  E. Mammen,et al.  Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors , 1997 .

[28]  S. Geer,et al.  The Lasso, correlated design, and improved oracle inequalities , 2011, 1107.0189.

[29]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[30]  Yuhong Yang Can the Strengths of AIC and BIC Be Shared , 2005 .

[31]  H. Leeb,et al.  Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator , 2007, 0704.1466.

[32]  Mohamed Hebiri,et al.  How Correlations Influence Lasso Prediction , 2012, IEEE Transactions on Information Theory.

[33]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[34]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[35]  A. Dalalyan,et al.  On the Prediction Performance of the Lasso , 2014, 1402.1700.