metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  P. Rantakallio,et al.  Groups at risk in low birth weight infants and perinatal mortality. , 1969, Acta paediatrica Scandinavica.

[3]  R. Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[4]  Olivier Ledoit,et al.  Improved estimation of the covariance matrix of stock returns with an application to portfolio selection , 2003 .

[5]  J. Wall,et al.  Haplotype blocks and linkage disequilibrium in the human genome , 2003, Nature Reviews Genetics.

[6]  Risto Telama,et al.  Cohort profile: the cardiovascular risk in Young Finns Study. , 2008, International journal of epidemiology.

[7]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[8]  Manuel A. R. Ferreira,et al.  A multivariate test of association , 2009, Bioinform..

[9]  Reino Laatikainen,et al.  High-throughput serum NMR metabonomics for cost-effective holistic studies on systemic metabolism. , 2009, The Analyst.

[10]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[11]  Veikko Salomaa,et al.  Thirty-five-year trends in cardiovascular risk factors in Finland. , 2010, International journal of epidemiology.

[12]  Niku Oksala,et al.  Novel Loci for Metabolic Networks and Multi-Tissue Expression Studies Reveal Genes for Atherosclerosis , 2012, PLoS genetics.

[13]  Manuel A. R. Ferreira,et al.  A gene-based test of association using canonical correlation analysis , 2012, Bioinform..

[14]  P. Visscher,et al.  Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits , 2012, Nature Genetics.

[15]  Stephen C. Ekker,et al.  Mojo Hand, a TALEN design tool for genome editing applications , 2013, BMC Bioinformatics.

[16]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[17]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[18]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[19]  Markus Perola,et al.  Genome-wide association study identifies multiple loci influencing human serum metabolite levels , 2012, Nature Genetics.

[20]  J. Ioannidis,et al.  Meta-analysis methods for genome-wide association studies and beyond , 2013, Nature Reviews Genetics.

[21]  M. Stephens A Unified Framework for Association Analysis with Multiple Related Phenotypes , 2013, PloS one.

[22]  J. Danesh,et al.  Large-scale association analysis identifies new risk loci for coronary artery disease , 2013 .

[23]  Conor V. Dolan,et al.  TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies , 2013, PLoS genetics.

[24]  Limsoon Wong,et al.  Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes , 2013, BMC Bioinformatics.

[25]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.

[26]  Jukka Corander,et al.  Genome-wide association studies with high-dimensional phenotypes , 2012, Statistical applications in genetics and molecular biology.

[27]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[28]  Tanya M. Teslovich,et al.  Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility , 2014, Nature Genetics.

[29]  John P. Overington,et al.  An atlas of genetic influences on human blood metabolites , 2014, Nature Genetics.

[30]  Shuang Feng,et al.  RAREMETAL: fast and powerful meta-analysis for rare variants , 2014, Bioinform..

[31]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[32]  Matti Pirinen,et al.  Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression , 2014, Bioinform..

[33]  Xiaofeng Zhu,et al.  Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. , 2015, American journal of human genetics.

[34]  Sara M. Willems,et al.  The impact of low-frequency and rare variants on lipid levels , 2015, Nature Genetics.

[35]  Pasi Soininen,et al.  Quantitative serum nuclear magnetic resonance metabolomics in cardiovascular epidemiology and genetics. , 2015, Circulation. Cardiovascular genetics.

[36]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[37]  Dragana Vuckovic,et al.  MultiMeta: an R package for meta-analyzing multi-phenotype genome-wide association studies , 2015, Bioinform..