The Adaptive Lasso and Its Oracle Properties

The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain scenarios where the lasso is inconsistent for variable selection. We then propose a new version of the lasso, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the ℓ1 penalty. We show that the adaptive lasso enjoys the oracle properties; namely, it performs as well as if the true underlying model were given in advance. Similar to the lasso, the adaptive lasso is shown to be near-minimax optimal. Furthermore, the adaptive lasso can be solved by the same efficient algorithm for solving the lasso. We also discuss the extension of the adaptive lasso in generalized linear models and show that the oracle properties still hold under mild regularity conditions. As a byproduct of our theory, the nonnegative garotte is shown to be consistent for variable selection.

[1]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[2]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[3]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[4]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[5]  J. H. Schuenemeyer,et al.  Generalized Linear Models (2nd ed.) , 1992 .

[6]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[7]  J. Friedman,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Response , 1993 .

[8]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[9]  C. Geyer On the Asymptotics of Constrained $M$-Estimation , 1994 .

[10]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[11]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[14]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[15]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[16]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[17]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[18]  Xiaotong Shen,et al.  Adaptive Model Selection , 2002 .

[19]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[21]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[22]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[23]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[24]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[25]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .