Transcriptome‐wide association studies accounting for colocalization using Egger regression

Integrating genome‐wide association (GWAS) and expression quantitative trait locus (eQTL) data into transcriptome‐wide association studies (TWAS) based on predicted expression can boost power to detect novel disease loci or pinpoint the susceptibility gene at a known disease locus. However, it is often the case that multiple eQTL genes colocalize at disease loci, making the identification of the true susceptibility gene challenging, due to confounding through linkage disequilibrium (LD). To distinguish between true susceptibility genes (where the genetic effect on phenotype is mediated through expression) and colocalization due to LD, we examine an extension of the Mendelian randomization (MR) egger regression method that allows for LD while only requiring summary association data for both GWAS and eQTL. We derive the standard TWAS approach in the context of MR and show in simulations that the standard TWAS does not control type I error for causal gene identification when eQTLs have pleiotropic or LD‐confounded effects on disease. In contrast, LD‐aware MR‐Egger (LDA MR‐Egger) regression can control type I error in this case while attaining similar power as other methods in situations where these provide valid tests. However, when the direct effects of genetic variants on traits are correlated with the eQTL associations, all of the methods we examined including LDA MR‐Egger regression can have inflated type I error. We illustrate these methods by integrating gene expression within a recent large‐scale breast cancer GWAS to provide guidance on susceptibility gene identification.

[1]  A. Gusev,et al.  Probabilistic fine-mapping of transcriptome-wide association studies , 2017, bioRxiv.

[2]  Gary D Bader,et al.  Association analysis identifies 65 new breast cancer risk loci , 2017, Nature.

[3]  K. Hao,et al.  Transcriptome-wide association studies: opportunities and challenges , 2017, bioRxiv.

[4]  R. Eeles,et al.  Abstract 1301: Identification of novel susceptibility loci and genes for prostate cancer risk: A large transcriptome-wide association study in over 143,000 subjects , 2017 .

[5]  S. Thompson,et al.  Interpreting findings from Mendelian randomization using the MR-Egger method , 2017, European Journal of Epidemiology.

[6]  Alexander Gusev,et al.  Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. , 2017, American journal of human genetics.

[7]  A. Price,et al.  Dissecting the genetics of complex traits using summary association statistics , 2016, Nature Reviews Genetics.

[8]  Ayellet V. Segrè,et al.  Colocalization of GWAS and eQTL Signals Detects Target Genes , 2016, bioRxiv.

[9]  Alexander Gusev,et al.  Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights , 2016, Nature Genetics.

[10]  B. Pasaniuc,et al.  Contrasting the genetic architecture of 30 complex traits from summary association data , 2016, bioRxiv.

[11]  P. Visscher,et al.  Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets , 2016, Nature Genetics.

[12]  Hae Kyung Im,et al.  MetaXcan: Summary Statistics Based Gene-Level Association Method Infers Accurate PrediXcan Results , 2016 .

[13]  Stephen Burgess,et al.  Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods , 2015, Statistics in medicine.

[14]  T. Lehtimäki,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015, Nature Genetics.

[15]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[16]  G. Davey Smith,et al.  Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression , 2015, International journal of epidemiology.

[17]  Jaana M. Hartikainen,et al.  Fine-scale mapping of the 5q11.2 breast cancer locus reveals at least three independent risk variants regulating MAP3K1. , 2015, American journal of human genetics.

[18]  C. Wallace,et al.  Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics , 2013, PLoS genetics.

[19]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[20]  C. Wallace Statistical Testing of Shared Genetic Control for Potentially Related Traits , 2013, Genetic epidemiology.

[21]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[22]  P. Visscher,et al.  Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits , 2012, Nature Genetics.

[23]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[24]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.