Union support recovery in high-dimensional multivariate regression

In the problem of multivariate regression, a K-dimensional response vector is regressed upon a common set of p covariates, with a matrix B* isin RopfptimesK of regression coefficients. We study the behavior of the group Lasso using lscr1/lscr2 regularization for the union support problem, meaning that the set of s rows for which B* is non-zero is recovered exactly. Studying this problem under high-dimensional scaling, we show that group Lasso recovers the exact row pattern with high probability over the random design and noise for scalings of (n, p, s) such that the sample complexity parameter given by thetas(n, p, s) := n/[2psi(B*) log(p - s)] exceeds a critical threshold. Here n is the sample size, p is the ambient dimension of the regression model, s is the number of non-zero rows, and psi(B*) is a sparsity-overlap function that measures a combination of the sparsities and overlaps of the K-regression coefficient vectors that constitute the model. This sparsity-overlap function reveals that, if the design is uncorrelated on the active rows, block lscr1/lscr2 regularization for multivariate regression never harms performance relative to an ordinary Lasso approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the regression vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the group Lasso.

[1]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[2]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[3]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[4]  G. Obozinski Joint covariate selection for grouped classification , 2007 .

[5]  Hao Helen Zhang,et al.  Variable selection for the multicategory SVM via adaptive sup-norm regularization , 2008, 0803.3676.

[6]  Martin J. Wainwright,et al.  Sharp thresholds for high-dimensional and noisy recovery of sparsity , 2006, ArXiv.

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[9]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[10]  Han Liu,et al.  On the ℓ 1 -ℓ q Regularized Regression , 2008 .

[11]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[12]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[13]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[14]  V. Buldygin,et al.  Metric characterization of random variables and random processes , 2000 .

[15]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[16]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[17]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[18]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[19]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[20]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[21]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[22]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[23]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[24]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[25]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[26]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[27]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[28]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[29]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .