Integrating tissue specific mechanisms into GWAS summary results

To understand the biological mechanisms underlying thousands of genetic variants robustly associated with complex traits, scalable methods that integrate GWAS and functional data generated by large-scale efforts are needed. Here we propose a method termed MetaXcan that addresses this need by inferring the downstream consequences of genetically regulated components of molecular traits on complex phenotypes using summary data only. MetaXcan allows multiple causal variants and flexible multivariate models extending the capabilities of existing methods and enabling the testing of more complex processes. As an example application, we trained prediction models of gene expression levels in 44 human tissues and inferred the consequences of their regulation in 40 complex phenotypes. Our examination of this broad set of human tissues revealed many novel genes and re-identified known ones with patterns of regulation in expected as well as unexpected tissues.

[1]  Gaurav Bhatia,et al.  Fast and accurate imputation of summary statistics enhances evidence of functional enrichment , 2013, Bioinform..

[2]  Matthew Stephens,et al.  USING LINEAR PREDICTORS TO IMPUTE ALLELE FREQUENCIES FROM SUMMARY OR POOLED GENOTYPE DATA. , 2010, The annals of applied statistics.

[3]  Eleazar Eskin,et al.  Local genetic effects on gene expression across 44 human tissues , 2016, bioRxiv.

[4]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[5]  M. Peters,et al.  Systematic identification of trans eQTLs as putative drivers of known disease associations , 2013, Nature Genetics.

[6]  Alkes L. Price,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015 .

[7]  Olle Melander,et al.  From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus , 2010, Nature.

[8]  Gerry Leversha,et al.  Statistical inference (2nd edn), by Paul H. Garthwaite, Ian T. Jolliffe and Byron Jones. Pp.328. £40 (hbk). 2002. ISBN 0 19 857226 3 (Oxford University Press). , 2003, The Mathematical Gazette.

[9]  Robert L. Grossman,et al.  Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets , 2014, J. Am. Medical Informatics Assoc..

[10]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[11]  Hae Kyung Im,et al.  Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues , 2016, bioRxiv.

[12]  C. Wallace,et al.  Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics , 2013, PLoS genetics.

[13]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[14]  X. Wen,et al.  Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization , 2016, bioRxiv.

[15]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[16]  Donghyung Lee,et al.  DIST: direct imputation of summary statistics for unmeasured SNPs , 2013, Bioinform..

[17]  N. Risch,et al.  Genome-wide association analyses using electronic health records identify new loci influencing blood pressure variation , 2016, Nature Genetics.

[18]  D. Koller,et al.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals , 2013, Genome research.

[19]  J. R. Scotti,et al.  Available From , 1973 .

[20]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[21]  E. Dermitzakis,et al.  Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations , 2010, PLoS genetics.

[22]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Shane A. McCarthy,et al.  Reference-based phasing using the Haplotype Reference Consortium panel , 2016, Nature Genetics.

[24]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[25]  Tom Michoel,et al.  Cardiometabolic risk loci share downstream cis- and trans-gene regulation across tissues and diseases , 2016, Science.

[26]  Christie M. Ballantyne,et al.  Lipid lowering with PCSK9 inhibitors , 2014, Nature Reviews Cardiology.

[27]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[28]  P. Visscher,et al.  Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets , 2016, Nature Genetics.

[29]  David A. Knowles,et al.  RNA splicing is a primary link between genetic variation and disease , 2016, Science.

[30]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[31]  Alexander Gusev,et al.  Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. , 2017, American journal of human genetics.

[32]  Roby Joehanes,et al.  Identification of common genetic variants controlling transcript isoform variation in human whole blood , 2015, Nature Genetics.

[33]  Jian Yang,et al.  Predicting gene targets from integrative analyses of summary data from GWAS and eQTL studies for 28 human complex traits , 2016, Genome Medicine.

[34]  Xia Yang,et al.  Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. , 2013, American journal of human genetics.

[35]  M. Daly,et al.  Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis , 2013, The Lancet.

[36]  Andrew P Morris,et al.  Multi-ethnic genome-wide association study identifies novel locus for type 2 diabetes susceptibility , 2016, European Journal of Human Genetics.

[37]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[38]  Kaanan P. Shah,et al.  Integrative cross tissue analysis of gene expression identifies novel type 2 diabetes genes , 2017, bioRxiv.

[39]  Giulio Genovese,et al.  Schizophrenia risk from complex variation of complement component 4 , 2016, Nature.

[40]  Alan M. Kwong,et al.  A reference panel of 64,976 haplotypes for genotype imputation , 2015, Nature Genetics.

[41]  J. Danesh,et al.  Large-scale association analysis identifies new risk loci for coronary artery disease , 2013 .

[42]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[43]  M. Eileen Dolan,et al.  Mixed Effects Modeling of Proliferation Rates in Cell-Based Models: Consequence for Pharmacogenomics and Cancer , 2012, PLoS genetics.

[44]  Tanya M. Teslovich,et al.  Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes , 2012, Nature Genetics.

[45]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[46]  Heather E. Wheeler,et al.  Survey of the heritability and sparsity of gene expression traits across human tissues , 2016 .

[47]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[48]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[49]  Han Xu,et al.  Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. , 2014, American journal of human genetics.