A framework for integrating directed and undirected annotations to build explanatory models of cis-eQTL data

A longstanding goal of regulatory genetics is to understand how variants in genome sequences lead to changes in gene expression. Here we present a method named Bayesian Annotation Guided eQTL Analysis (BAGEA), a variational Bayes framework to model cis-eQTLs using directed and undirected genomic annotations. In a use case, we integrated directed genomic annotations with eQTL summary statistics from tissues of various origins. This analysis revealed epigenetic marks that are relevant for gene expression in different tissues and cell types. We estimated the predictive power of the models that were fitted based on directed genomic annotations. This analysis showed that, depending on the underlying eQTL data used, the directed genomic annotations could predict up to 1.5% of the variance observed in the expression of genes with top nominal eQTL association p-values < 10−7. For genes with estimated effect sizes in the top 25% quantile, up to 5% of the expression variance could be predicted. Based on our results, we recommend the use of BAGEA for the analysis of cis-eQTL data to reveal annotations relevant to expression biology.

[1]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[2]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[3]  M. Stephens,et al.  High-Resolution Mapping of Expression-QTLs Yields Insight into Human Gene Regulation , 2008, PLoS genetics.

[4]  Matthew Stephens,et al.  Dissecting the regulatory architecture of gene expression QTLs , 2012, Genome Biology.

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  K. Meckling,et al.  1α, 25-dihydroxyvitamin D3 and bryostatin-1 synergize to induce monocytic differentiation of NB4 acute promyelocytic leukemia cells by modulating cell cycle progression , 2004 .

[7]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[8]  Peiman Hematti,et al.  Mesenchymal stromal cells and fibroblasts: a case of mistaken identity? , 2012, Cytotherapy.

[9]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[10]  Chandra L. Theesfeld,et al.  Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk , 2018, Nature Genetics.

[11]  Ryan P. Adams,et al.  Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk , 2017, bioRxiv.

[12]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[13]  Alan M. Kwong,et al.  A reference panel of 64,976 haplotypes for genotype imputation , 2015, Nature Genetics.

[14]  H. Nakagawa,et al.  Esophageal 3D Culture Systems as Modeling Tools in Esophageal Epithelial Pathobiology and Personalized Medicine , 2018, Cellular and molecular gastroenterology and hepatology.

[15]  Cynthia A. Kalita,et al.  Which Genetics Variants in DNase-Seq Footprints Are More Likely to Alter Binding? , 2016, PLoS genetics.

[16]  Zoltán Kutalik,et al.  A multi-SNP locus-association method reveals a substantial fraction of the missing heritability. , 2012, American journal of human genetics.

[17]  Benjamin J. Strober,et al.  A method to predict the impact of regulatory variants from DNA sequence , 2015, Nature Genetics.

[18]  Alexander Gusev,et al.  Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights , 2016, Nature Genetics.

[19]  C. Greenwood,et al.  Genetic architecture: the shape of the genetic contribution to human traits and disease , 2017, Nature Reviews Genetics.

[20]  Allon M. Klein,et al.  A Single Progenitor Population Switches Behavior to Maintain and Repair Esophageal Epithelium , 2012, Science.

[21]  Angela Patricia Beltrán-López Asociación de variantes genéticas en el gen DEAR1 con cáncer de seno y desenlace clínico en población colombiana , 2020 .

[22]  P. Visscher,et al.  Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits , 2012, Nature Genetics.

[23]  David Haussler,et al.  The UCSC Genome Browser database: 2019 update , 2018, Nucleic Acids Res..

[24]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[25]  R. Elston,et al.  The investigation of linkage between a quantitative trait and a marker locus , 1972, Behavior genetics.

[26]  P. Deloukas,et al.  Integrating Genome-Wide Genetic Variations and Monocyte Expression Data Reveals Trans-Regulated Gene Modules in Humans , 2011, PLoS genetics.

[27]  Ayellet V. Segrè,et al.  Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation , 2018, Nature Genetics.

[28]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[29]  A. Chapman,et al.  Adventitial fibroblasts in vascular structure and function: the role of oxidative stress and beyond. , 2010, Canadian journal of physiology and pharmacology.

[30]  Peiman Hematti,et al.  Fibroblasts and Mesenchymal Stromal/Stem Cells Are Phenotypically Indistinguishable , 2016, Acta Haematologica.

[31]  R. Andrews,et al.  Innate Immune Activity Conditions the Effect of Regulatory Variants upon Monocyte Gene Expression , 2014, Science.

[32]  Jonathan K. Pritchard,et al.  The Genetic and Mechanistic Basis for Variation in Gene Regulation , 2015, PLoS genetics.

[33]  Shane T. Jensen,et al.  Bayesian integration of genetics and epigenetics detects causal regulatory SNPs underlying expression variability , 2015, Nature Communications.

[34]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[35]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.