On the ℓ 1 -ℓ q Regularized Regression

In this paper we consider the problem of grouped variable selection in high-dimensional regression using l1-lq regularization (1 ≤ q ≤ ∞), which can be viewed as a natural generalization of the l1-l2 regularization (the group Lasso). The key condition is that the dimensionality pn can increase much faster than the sample size n, i.e. pn ≫ n (in our case pn is the number of groups), but the number of relevant groups is small. The main conclusion is that many good properties from l1-regularization (Lasso) naturally carry on to the l1-lq cases (1 ≤ q ≤ ∞), even if the number of variables within each group also increases with the sample size. With fixed design, we show that the whole family of estimators are both estimation consistent and variable selection consistent under different conditions. We also show the persistency result with random design under a much weaker condition. These results provide a unified treatment for the whole family of estimators ranging from q = 1 (Lasso) to q = ∞ (iCAP), with q = 2 (group Lasso)as a special case. When there is no group structure available, all the analysis reduces to the current results of the Lasso estimator (q = 1).

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  C. L. Mallows Some comments on C_p , 1973 .

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[5]  C. Mallows More comments on C p , 1995 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[8]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[9]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[10]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[11]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[12]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  J. Tropp Algorithms for simultaneous sparse approximation. Part II: Convex relaxation , 2006, Signal Process..

[15]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[16]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[17]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[18]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[19]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[20]  Larry A. Wasserman,et al.  Compressed Regression , 2007, NIPS.

[21]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[22]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[23]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[24]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[25]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[26]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[27]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[28]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[29]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.