Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases

Genome wide association studies (GWAS) provide a powerful approach for uncovering disease-associated variants in human, but fine-mapping the causal variants remains a challenge. This is partly remedied by prioritization of disease-associated variants that overlap GWAS-enriched epigenomic annotations. Here, we introduce a new Bayesian model RiVIERA (Risk Variant Inference using Epigenomic Reference Annotations) for inference of driver variants from summary statistics across multiple traits using hundreds of epigenomic annotations. In simulation, RiVIERA promising power in detecting causal variants and causal annotations, the multi-trait joint inference further improved the detection power. We applied RiVIERA to model the existing GWAS summary statistics of 9 autoimmune diseases and Schizophrenia by jointly harnessing the potential causal enrichments among 848 tissue-specific epigenomics annotations from ENCODE/Roadmap consortium covering 127 cell/tissue types and 8 major epigenomic marks. RiVIERA identified meaningful tissue-specific enrichments for enhancer regions defined by H3K4me1 and H3K27ac for Blood T-Cell specifically in the nine autoimmune diseases and Brain-specific enhancer activities exclusively in Schizophrenia. Moreover, the variants from the 95% credible sets exhibited high conservation and enrichments for GTEx whole-blood eQTLs located within transcription-factor-binding-sites and DNA-hypersensitive-sites. Furthermore, joint modeling the nine immune traits by simultaneously inferring and exploiting the underlying epigenomic correlation between traits further improved the functional enrichments compared to single-trait models.

[1]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[2]  Miao-Xin Li,et al.  FAPI: Fast and accurate P-value Imputation for genome-wide association study , 2015, European Journal of Human Genetics.

[3]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[4]  Joseph K. Pickrell Joint analysis of functional genomic data and genome-wide association studies of 18 human traits , 2013, bioRxiv.

[5]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[6]  Matti Pirinen,et al.  Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity , 2012 .

[7]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[8]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  Manolis Kellis,et al.  Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments , 2013, Nucleic acids research.

[11]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[12]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[13]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[14]  Buhm Han,et al.  Chromatin marks identify critical cell types for fine mapping complex trait variants , 2012 .

[15]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[16]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[17]  Han Xu,et al.  Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. , 2014, American journal of human genetics.

[18]  R. Young,et al.  Super-Enhancers in the Control of Cell Identity and Disease , 2013, Cell.

[19]  Hongyu Zhao,et al.  GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation , 2014, PLoS genetics.

[20]  Manolis Kellis,et al.  Interpreting non-coding variation in complex disease genetics , 2012, Nature Biotechnology.

[21]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[22]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[23]  S. Batzoglou,et al.  Linking disease associations with regulatory information in the human genome , 2012, Genome research.

[24]  Matti Pirinen,et al.  A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1 , 2010, Nature Genetics.

[25]  Jake K. Byrnes,et al.  Bayesian refinement of association signals for 14 loci in 3 common diseases , 2012, Nature Genetics.

[26]  S. Ferrari,et al.  Beta Regression for Modelling Rates and Proportions , 2004 .

[27]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[28]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[29]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[30]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[31]  Eleazar Eskin,et al.  Identification of causal genes for complex traits , 2015, Bioinform..

[32]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[33]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[34]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[35]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .

[36]  C. Spencer,et al.  Biological Insights From 108 Schizophrenia-Associated Genetic Loci , 2014, Nature.

[37]  Peter M Visscher,et al.  Prediction of individual genetic risk of complex disease. , 2008, Current opinion in genetics & development.

[38]  Jeffrey S. Rosenthal,et al.  Optimal Proposal Distributions and Adaptive MCMC , 2011 .

[39]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[40]  Hongyu Zhao,et al.  Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies , 2015, bioRxiv.

[41]  E. Eskin,et al.  Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies , 2014, PLoS genetics.

[42]  C. Bayes,et al.  A New Robust Regression Model for Proportions , 2012 .

[43]  Albert J. Vilella,et al.  A high-resolution map of human evolutionary constraint using 29 mammals , 2011, Nature.

[44]  Laura J. Scott,et al.  Psychiatric genome-wide association study analyses implicate neuronal, immune and histone pathways , 2015, Nature Neuroscience.

[45]  Gregory A. Poland,et al.  Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics , 2015, Genetics.

[46]  B. Pasaniuc,et al.  Leveraging Functional-Annotation Data in Trans-ethnic Fine-Mapping Studies. , 2015, American journal of human genetics.

[47]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[48]  Manolis Kellis,et al.  Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers , 2015, Nature Genetics.

[49]  Sylvia Richardson,et al.  Dissection of a Complex Disease Susceptibility Region Using a Bayesian Stochastic Search Approach to Fine Mapping , 2015, bioRxiv.

[50]  Robert Gentleman,et al.  rtracklayer: an R package for interfacing with genome browsers , 2009, Bioinform..

[51]  Simon C. Potter,et al.  Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants , 2007, Nature Genetics.

[52]  M. Stephens,et al.  Bayesian statistical methods for genetic association studies , 2009, Nature Reviews Genetics.

[53]  S. Richardson,et al.  Dissection of a complex disease susceptibility region using a Bayesian stochastic search approach to fine mapping , 2015, bioRxiv.

[54]  M. Ritchie,et al.  Methods of integrating data to uncover genotype–phenotype interactions , 2015, Nature Reviews Genetics.

[55]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[56]  M. Daly,et al.  Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis , 2013, The Lancet.

[57]  Xiaoquan Wen,et al.  Cross-Population Joint Analysis of eQTLs: Fine Mapping and Functional Annotation , 2014, bioRxiv.

[58]  M. Daly,et al.  Genetic and Epigenetic Fine-Mapping of Causal Autoimmune Disease Variants , 2014, Nature.

[59]  Joseph K. Pickrell,et al.  Approximately independent linkage disequilibrium blocks in human populations , 2015, bioRxiv.