metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analysing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco.

[1]  Dragana Vuckovic,et al.  MultiMeta: an R package for meta-analyzing multi-phenotype genome-wide association studies , 2015, Bioinform..

[2]  Sara M. Willems,et al.  The impact of low-frequency and rare variants on lipid levels , 2015, Nature Genetics.

[3]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[4]  Pasi Soininen,et al.  Quantitative serum nuclear magnetic resonance metabolomics in cardiovascular epidemiology and genetics. , 2015, Circulation. Cardiovascular genetics.

[5]  Tim Becker,et al.  METAINTER: meta-analysis of multiple regression models in genome-wide association studies , 2015, Bioinform..

[6]  Xiaofeng Zhu,et al.  Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. , 2015, American journal of human genetics.

[7]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[8]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[9]  Shuang Feng,et al.  RAREMETAL: fast and powerful meta-analysis for rare variants , 2014, Bioinform..

[10]  John P. Overington,et al.  An atlas of genetic influences on human blood metabolites , 2014, Nature Genetics.

[11]  Tanya M. Teslovich,et al.  Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility , 2014 .

[12]  Matti Pirinen,et al.  Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression , 2014, Bioinform..

[13]  L. Wong,et al.  Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes , 2013, BMC Bioinformatics.

[14]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.

[15]  M. Stephens A Unified Framework for Association Analysis with Multiple Related Phenotypes , 2013, PloS one.

[16]  J. Ioannidis,et al.  Meta-analysis methods for genome-wide association studies and beyond , 2013, Nature Reviews Genetics.

[17]  Stephen C. Ekker,et al.  Mojo Hand, a TALEN design tool for genome editing applications , 2013, BMC Bioinformatics.

[18]  Conor V. Dolan,et al.  TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies , 2013, PLoS genetics.

[19]  J. Danesh,et al.  Large-scale association analysis identifies new risk loci for coronary artery disease , 2013 .

[20]  Jukka Corander,et al.  Genome-wide association studies with high-dimensional phenotypes , 2012, Statistical applications in genetics and molecular biology.

[21]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[22]  Niku Oksala,et al.  Novel Loci for Metabolic Networks and Multi-Tissue Expression Studies Reveal Genes for Atherosclerosis , 2012, PLoS genetics.

[23]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[24]  Manuel A. R. Ferreira,et al.  A gene-based test of association using canonical correlation analysis , 2012, Bioinform..

[25]  Markus Perola,et al.  Genome-wide association study identifies multiple loci influencing human serum metabolite levels , 2012, Nature Genetics.

[26]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[27]  Veikko Salomaa,et al.  Thirty-five-year trends in cardiovascular risk factors in Finland. , 2010, International journal of epidemiology.

[28]  Reino Laatikainen,et al.  High-throughput serum NMR metabonomics for cost-effective holistic studies on systemic metabolism. , 2009, The Analyst.

[29]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[30]  Risto Telama,et al.  Cohort profile: the cardiovascular risk in Young Finns Study. , 2008, International journal of epidemiology.

[31]  Olivier Ledoit,et al.  Improved estimation of the covariance matrix of stock returns with an application to portfolio selection , 2003 .

[32]  J. Wall,et al.  Haplotype blocks and linkage disequilibrium in the human genome , 2003, Nature Reviews Genetics.

[33]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[34]  Manuel A. R. Ferreira,et al.  A multivariate test of association , 2009, Bioinform..

[35]  R. Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[36]  P. Rantakallio,et al.  Groups at risk in low birth weight infants and perinatal mortality. , 1969, Acta paediatrica Scandinavica.