Modeling association between multivariate correlated outcomes and high-dimensional sparse covariates: the adaptive SVS method

ABSTRACT The problem of modeling the relationship between a set of covariates and a multivariate response with correlated components often arises in many areas of research such as genetics, psychometrics, signal processing. In the linear regression framework, such task can be addressed using a number of existing methods. In the high-dimensional sparse setting, most of these methods rely on the idea of penalization in order to efficiently estimate the regression matrix. Examples of such methods include the lasso, the group lasso, the adaptive group lasso or the simultaneous variable selection (SVS) method. Crucially, a suitably chosen penalty also allows for an efficient exploitation of the correlation structure within the multivariate response. In this paper we introduce a novel variant of such method called the adaptive SVS, which is closely linked with the adaptive group lasso. Via a simulation study we investigate its performance in the high-dimensional sparse regression setting. We provide a comparison with a number of other popular methods under different scenarios and show that the adaptive SVS is a powerful tool for efficient recovery of signal in such setting. The methods are applied to genetic data.

[1]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[2]  Yingying Fan,et al.  Tuning parameter selection in high dimensional penalized likelihood , 2013, 1605.03321.

[3]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[6]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[7]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[8]  Martin J. Wainwright,et al.  Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block $\ell _{1}/\ell _{\infty} $-Regularization , 2009, IEEE Transactions on Information Theory.

[9]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[10]  T. Cooper,et al.  The pathobiology of splicing , 2010, The Journal of pathology.

[11]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[13]  Dmitry M. Malioutov,et al.  A sparse signal reconstruction perspective for source localization with sensor arrays , 2005, IEEE Transactions on Signal Processing.

[14]  Jian Huang,et al.  Consistent group selection in high-dimensional linear regression. , 2010, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[15]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[16]  E. Moore Jean , 1944 .

[17]  Rappold,et al.  Human Molecular Genetics , 1996, Nature Medicine.

[18]  Timo Similä,et al.  Input selection and shrinkage in multiresponse linear regression , 2007, Comput. Stat. Data Anal..

[19]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[20]  Hansheng Wang,et al.  Computational Statistics and Data Analysis a Note on Adaptive Group Lasso , 2022 .

[21]  S. Geer,et al.  Correlated variables in regression: Clustering and sparse estimation , 2012, 1209.5908.

[22]  M. Wainwright,et al.  Simultaneous support recovery in high dimensions : Benefits and perils of block l 1 / l ∞-regularization , 2009 .

[23]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[24]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[25]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[26]  Joel A. Tropp,et al.  Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit , 2006, Signal Process..

[27]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[28]  J. Pečánka Multi-step statistical methods for simultaneous inference in genetics , 2016 .

[29]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[30]  D B Allison,et al.  Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages. , 1998, American journal of human genetics.

[31]  L. Kiemeney,et al.  A Comparison of Multivariate Genome-Wide Association Methods , 2014, PloS one.

[32]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[33]  Jian Huang,et al.  VARIABLE SELECTION AND ESTIMATION IN HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS. , 2011, Statistica Sinica.

[34]  Joel A. Tropp,et al.  ALGORITHMS FOR SIMULTANEOUS SPARSE APPROXIMATION , 2006 .

[35]  A. V. D. Vaart,et al.  Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences , 2012, 1211.1197.

[36]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.