Sparse meta-analysis with high-dimensional data.

Meta-analysis plays an important role in summarizing and synthesizing scientific evidence derived from multiple studies. With high-dimensional data, the incorporation of variable selection into meta-analysis improves model interpretation and prediction. Existing variable selection methods require direct access to raw data, which may not be available in practical situations. We propose a new approach, sparse meta-analysis (SMA), in which variable selection for meta-analysis is based solely on summary statistics and the effect sizes of each covariate are allowed to vary among studies. We show that the SMA enjoys the oracle property if the estimated covariance matrix of the parameter estimators from each study is available. We also show that our approach achieves selection consistency and estimation consistency even when summary statistics include only the variance estimators or no variance/covariance information at all. Simulation studies and applications to high-throughput genomics studies demonstrate the usefulness of our approach.

[1]  N. Laird,et al.  Meta-analysis in clinical trials. , 1986, Controlled clinical trials.

[2]  Dan-Yu Lin,et al.  Meta-analysis of gene-level associations for rare variants based on single-variant statistics. , 2013, American journal of human genetics.

[3]  Josée Dupuis,et al.  A Method of Moments Estimator for Random Effect Multivariate Meta‐Analysis , 2012, Biometrics.

[4]  P. Visscher,et al.  Title: Across-cohort Qc Analyses of Genome-wide Association Study Summary Statistics from Complex Traits Wray 1 , the Genetic Investigation of Anthropometric Traits (giant) Consortium , 2015 .

[5]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[6]  Sijian Wang,et al.  Variable Selection for Multiply-imputed Data with Application to Dioxin Exposure Study Variable Selection for Multiply-imputed Data , 2011 .

[7]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[8]  V. Frouin,et al.  Variable selection for generalized canonical correlation analysis. , 2014, Biostatistics.

[9]  M. Fornage,et al.  A Phenomics-Based Strategy Identifies Loci on APOC1, BRAP, and PLCG1 Associated with Metabolic Syndrome Phenotype Domains , 2011, PLoS genetics.

[10]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[11]  Jian Huang,et al.  Integrative analysis and variable selection with multiple high-dimensional data sets. , 2011, Biostatistics.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.

[14]  D. Zeng,et al.  On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. , 2010, Biometrika.

[15]  Dajiang J. Liu,et al.  Meta-Analysis of Gene Level Tests for Rare Variant Association , 2013, Nature Genetics.

[16]  Ji Zhu,et al.  A ug 2 01 0 Group Variable Selection via a Hierarchical Lasso and Its Oracle Property Nengfeng Zhou Consumer Credit Risk Solutions Bank of America Charlotte , NC 28255 , 2010 .

[17]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[18]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[19]  Fei Zou,et al.  High‐Dimensional Variable Selection in Meta‐Analysis for Censored Data , 2011, Biometrics.

[20]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[21]  Joseph K. Pickrell Joint analysis of functional genomic data and genome-wide association studies of 18 human traits , 2013, bioRxiv.

[22]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[23]  Dan Jackson,et al.  Extending DerSimonian and Laird's methodology to perform multivariate random effects meta‐analyses , 2009, Statistics in medicine.

[24]  S. Zeisel Nutrigenomics and metabolomics will change clinical nutrition and public health practice: insights from studies on dietary requirements for choline. , 2007, The American journal of clinical nutrition.

[25]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[26]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[27]  J. H. Noble Meta-analysis: Methods, strengths, weaknesses, and political uses. , 2006, The Journal of laboratory and clinical medicine.