Simultaneous support recovery in high dimensions : Benefits and perils of block l 1 / l ∞-regularization

Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports of size at most s. This set-up suggests the use of l1/l∞-regularized regression for joint estimation of the p × r matrix of regression coefficients. We analyze the high-dimensional scaling ofl1/l∞-regularized quadratic programming, considering both consistency rates in l∞-norm, and how the minimal sample size n required for consistent variable selection scales with model dimension, sparsity, and overlap between the supports. We first establish bounds on thel∞-error as well sufficient conditions for exact variable selection for fixed design matrices, as well as for designs drawn randomly from general Gaussian distributions. Specializing to the caser = 2 linear regression problems with standard Gaussian designs whose supports overlap in a fraction α ∈ [0, 1] of their entries, we prove that l1/l∞-regularized method undergoes a phase transition characterized by the rescaled sample size θ1,∞(n, p, s, α) = n/{(4 − 3α)s log(p − (2 − α) s)}. An implication is that the use ofl1/l∞-regularization yields improved statistical efficiency if the overlap parameter is large enough ( α > 2/3), but has worse statistical efficiency than a naive Lasso-based approach for moderate to small overlap (α < 2/3). Empirical simulations illustrate the close agreement between theory and actual behavior in practice. These results show that caution must be exercised in applyingl1/l∞ block regularization: if the data does not match its structure very closely, it can impair statistical performance relative to computationally less expensive schemes.

[1]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[2]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[3]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[6]  S. Mallat A wavelet tour of signal processing , 1998 .

[7]  Eero P. Simoncelli Bayesian Denoising of Visual Images in the Wavelet Domain , 1999 .

[8]  V. Buldygin,et al.  Metric characterization of random variables and random processes , 2000 .

[9]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[10]  M. Ledoux The concentration of measure phenomenon , 2001 .

[11]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[12]  D. Donoho,et al.  Counting faces of randomly-projected polytopes when the projection radically lowers dimension , 2006, math/0607364.

[13]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[14]  Joel A. Tropp,et al.  ALGORITHMS FOR SIMULTANEOUS SPARSE APPROXIMATION , 2006 .

[15]  Martin J. Wainwright,et al.  Sharp thresholds for high-dimensional and noisy recovery of sparsity , 2006, ArXiv.

[16]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[17]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[18]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[19]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[20]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[21]  G. Obozinski Joint covariate selection for grouped classification , 2007 .

[22]  A. Rinaldo,et al.  On the asymptotic properties of the group lasso estimator for linear models , 2008 .

[23]  Larry A. Wasserman,et al.  Nonparametric regression and classification with joint sparsity constraints , 2008, NIPS.

[24]  Francis R. Bach,et al.  Consistency of trace norm minimization , 2007, J. Mach. Learn. Res..

[25]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[26]  Sunita Sarawagi Learning with Graphical Models , 2008 .

[27]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[28]  Han Liu,et al.  On the ℓ 1 -ℓ q Regularized Regression , 2008 .

[29]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[30]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[31]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[32]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[33]  Martin J. Wainwright,et al.  Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting , 2009, IEEE Trans. Inf. Theory.

[34]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[35]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[36]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[37]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[38]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparse Signal Recovery: Dense versus Sparse Measurement Matrices , 2008, IEEE Transactions on Information Theory.

[39]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[40]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[41]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .