Union support recovery in high-dimensional multivariate regression

multivariate group Lasso, in which block regularization based on the ‘1/‘2 norm is used for support union recovery, or recovery of the set of s rows for which B is non-zero. Under high-dimensional scaling, we show that the multivariate group Lasso exhibits a threshold for the recovery of the exact row pattern with high probability over the random design and noise that is specified by the sample complexity parameter (n,p,s) := n/[2 (B )log(p s)]. Here n is the sample size, and (B ) is a sparsity-overlap function measuring a combination of the sparsities and overlaps of the K-regression coecient vectors that constitute the model. We prove that the multivariate group Lasso succeeds for problem sequences (n,p,s) such that (n,p,s) exceeds a critical level u, and fails for sequences such that (n,p,s) lies below a critical level ‘. For the special case of the standard Gaussian ensemble, we show that ‘ = u so that the characterization is sharp. The sparsity-overlap function (B ) reveals that, if the design is uncorrelated on the active rows, ‘1/‘2 regularization for multivariate regression never harms performance relative to an ordinary Lasso approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the regression vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the multivariate group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even for relatively small problems.

[1]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[2]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[3]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[6]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[7]  V. Buldygin,et al.  Metric characterization of random variables and random processes , 2000 .

[8]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[9]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[10]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[11]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[12]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[13]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[14]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[15]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[16]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[17]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[18]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[19]  Martin J. Wainwright,et al.  Sharp thresholds for high-dimensional and noisy recovery of sparsity , 2006, ArXiv.

[20]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[21]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[22]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[23]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[24]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[25]  Peter J. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG , 2007 .

[26]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[27]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[28]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[29]  G. Obozinski Joint covariate selection for grouped classification , 2007 .

[30]  M. Wainwright,et al.  Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ 1,∞ -regularization , 2008, NIPS 2008.

[31]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[32]  Hao Helen Zhang,et al.  Variable selection for the multicategory SVM via adaptive sup-norm regularization , 2008, 0803.3676.

[33]  Han Liu,et al.  On the ℓ 1 -ℓ q Regularized Regression , 2008 .

[34]  Han Liu,et al.  On the ℓ 1 -ℓ q Regularized Regression , 2008, 0802.1517.

[35]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[36]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[37]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[38]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[39]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[40]  Martin J. Wainwright,et al.  Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting , 2009, IEEE Trans. Inf. Theory.

[41]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[42]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[43]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[44]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[45]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[46]  Massimiliano Pontil,et al.  Multi-task Learning , 2020, Transfer Learning.

[47]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .