CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies

Motivation Although genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) [42] was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required. Results In this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS data. Contact jin.liu@duke-nus.edu.sg Availability and implementation The implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu8100822/CoMM https://github.com/gordonliu8100822/CoMM. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Clifford R. Jack,et al.  Association of Alzheimer's disease GWAS loci with MRI markers of brain aging , 2015, Neurobiology of Aging.

[2]  Jin Liu,et al.  IGESS: a statistical approach to integrating individual‐level genotype data and summary statistics in genome‐wide association studies , 2017, Bioinform..

[3]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[4]  Jean-Antoine Girault,et al.  PTK2B/Pyk2 overexpression improves a mouse model of Alzheimer's disease , 2018, Experimental Neurology.

[5]  T. Lehtimäki,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015, Nature Genetics.

[6]  Xiang Zhu,et al.  Bayesian large-scale multiple regression with summary statistics from genome-wide association studies , 2016, bioRxiv.

[7]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[8]  Eleazar Eskin,et al.  Identifying Causal Variants at Loci with Multiple Signals of Association , 2014, Genetics.

[9]  Jin Liu,et al.  CoMM: a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information , 2018, Bioinform..

[10]  E. Dermitzakis,et al.  Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations , 2010, PLoS genetics.

[11]  Alkes L. Price,et al.  Single-Tissue and Cross-Tissue Heritability of Gene Expression Via Identity-by-Descent in Related or Unrelated Individuals , 2011, PLoS genetics.

[12]  N. Andreasen,et al.  Pathways to Alzheimer's disease , 2014, Journal of internal medicine.

[13]  Claude Bouchard,et al.  Genome-wide physical activity interactions in adiposity. A meta-analysis of 200,452 adults , 2017 .

[14]  Biswajit Padhy,et al.  Pseudoexfoliation and Alzheimer’s associated CLU risk variant, rs2279590, lies within an enhancer element and regulates CLU, EPHX2 and PTK2B gene expression , 2017, Human molecular genetics.

[15]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[16]  Pim van der Harst,et al.  Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease , 2017, Circulation research.

[17]  Xiang Zhu,et al.  Bayesian large-scale multiple regression with summary statistics from genome-wide association studies , 2016, bioRxiv.

[18]  Hongyu Zhao,et al.  A statistical framework for cross-tissue transcriptome-wide association analysis , 2018, Nature Genetics.

[19]  Bo Wang,et al.  Inadequacy of interval estimates corresponding to variational Bayesian approximations , 2005, AISTATS.

[20]  Matthew Stephens,et al.  Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes , 2018, Nature Communications.

[21]  M. Opper,et al.  Advanced mean field methods: theory and practice , 2001 .

[22]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[23]  D. Rubin,et al.  Parameter expansion to accelerate EM : The PX-EM algorithm , 1997 .

[24]  Can Yang,et al.  REMI: REGRESSION WITH MARGINAL INFORMATION AND ITS APPLICATION IN GENOME-WIDE ASSOCIATION STUDIES , 2018, Statistica Sinica.

[25]  H. Soininen,et al.  Functional screening of Alzheimer risk loci identifies PTK2B as an in vivo modulator and early marker of Tau pathology , 2016, Molecular Psychiatry.

[26]  Jing Ma,et al.  MS4A Cluster in Alzheimer’s Disease , 2014, Molecular Neurobiology.

[27]  M. Owen,et al.  Increased expression of BIN1 mediates Alzheimer genetic risk by modulating tau pathology , 2013, Molecular Psychiatry.

[28]  Christian Gieger,et al.  RL-SKAT: An Exact and Efficient Score Test for Heritability and Set Tests , 2017, Genetics.

[29]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[30]  Jian Huang,et al.  LPG: A four-group probabilistic approach to leveraging pleiotropy in genome-wide association studies , 2018, BMC Genomics.

[31]  Nick C Fox,et al.  Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease , 2013, Nature Genetics.

[32]  L. Tan,et al.  Bridging integrator 1 (BIN1): form, function, and Alzheimer's disease. , 2013, Trends in molecular medicine.

[33]  Hae Kyung Im,et al.  Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues , 2016, bioRxiv.

[34]  Elina Salmela,et al.  Genetic structure in Finland and Sweden : aspects of population history and gene mapping , 2012 .

[35]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[36]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[37]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[38]  H. Zou,et al.  Addendum: Regularization and variable selection via the elastic net , 2005 .

[39]  J. Danesh,et al.  Association analyses based on false discovery rate implicate new loci for coronary artery disease , 2017, Nature Genetics.

[40]  Timothy J. Hohman,et al.  Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk , 2019, Nature Genetics.

[41]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[42]  C. Hoggart,et al.  Genome-wide association analysis of metabolic traits in a birth cohort from a founder population , 2008, Nature Genetics.

[43]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[44]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[45]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[46]  Jean-Baptiste Cazier,et al.  Genome-Wide Association Study in a Lebanese Cohort Confirms PHACTR1 as a Major Determinant of Coronary Artery Stenosis , 2012, PloS one.

[47]  Yi Yang,et al.  VIMCO: variational inference for multiple correlated outcomes in genome-wide association studies , 2018, Bioinform..

[48]  Todd L Edwards,et al.  Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics , 2018, Nature Communications.

[49]  Hongyu Zhao,et al.  A statistical framework for cross-tissue transcriptome-wide association analysis , 2018, bioRxiv.