TreeMap: a structured approach to fine mapping of eQTL variants

Motivation Expression quantitative trait loci (eQTL) harbor genetic variants modulating gene transcription. Fine mapping of regulatory variants at these loci is a daunting task due to the juxtaposition of causal and linked variants at a locus as well as the likelihood of interactions among multiple variants. This problem is exacerbated in genes with multiple cis-acting eQTL, where superimposed effects of adjacent loci further distort the association signals. Results We developed a novel algorithm, TreeMap, that identifies putative causal variants in cis-eQTL accounting for multisite effects and genetic linkage at a locus. Guided by the hierarchical structure of linkage disequilibrium, TreeMap performs an organized search for individual and multiple causal variants. Via extensive simulations, we show that TreeMap detects co-regulating variants more accurately than current methods. Furthermore, its high computational efficiency enables genome-wide analysis of long-range eQTL. We applied TreeMap to GTEx data of brain hippocampus samples and transverse colon samples to search for eQTL in gene bodies and in 4 Mbps gene-flanking regions, discovering numerous distal eQTL. Furthermore, we found concordant distal eQTL that were present in both brain and colon samples, implying long-range regulation of gene expression. Availability TreeMap is available as an R package enabled for parallel processing at https://github.com/liliulab/treemap.

[1]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  H. Bussemaker,et al.  High-throughput identification of human SNPs affecting regulatory element activity , 2019, Nature Genetics.

[3]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[4]  Hae-Young Kim,et al.  Statistical notes for clinical researchers: post-hoc multiple comparisons , 2015, Restorative dentistry & endodontics.

[5]  Buhm Han,et al.  Chromatin marks identify critical cell types for fine mapping complex trait variants , 2012 .

[6]  Jieping Ye,et al.  Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Matti Pirinen,et al.  FINEMAP: efficient variable selection using summary data from genome-wide association studies , 2015, bioRxiv.

[8]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[9]  Ting Wang,et al.  The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions , 2017, Genome Biology.

[10]  P. Visscher,et al.  Genome-wide complex trait analysis (GCTA): methods, data analyses, and interpretations. , 2013, Methods in molecular biology.

[11]  T. Spector,et al.  Predicting causal variants affecting expression by using whole-genome sequencing and RNA-seq from multiple human tissues , 2017, Nature Genetics.

[12]  Xiaoquan Wen,et al.  Efficient Integrative Multi-SNP Association Analysis using Deterministic Approximation of Posteriors , 2015, bioRxiv.

[13]  Luke R. Lloyd-Jones,et al.  Comprehensive Multiple eQTL Detection and Its Application to GWAS Interpretation. , 2019, Genetics.

[14]  Mark I. McCarthy,et al.  Evaluating the Performance of Fine-Mapping Strategies at Common Variant GWAS Loci , 2015, PLoS genetics.

[15]  Greg Gibson,et al.  Biological relevance of computationally predicted pathogenicity of noncoding variants , 2019, Nature Communications.

[16]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[17]  Dmitri V Zaykin,et al.  Ranks of Genuine Associations in Whole-Genome Scans , 2005, Genetics.

[18]  Emmanouil T. Dermitzakis,et al.  Fast and efficient QTL mapper for thousands of molecular phenotypes , 2015, bioRxiv.

[19]  Christopher R. Sibley,et al.  Identification of expression quantitative trait loci associated with schizophrenia and affective disorders in normal brain tissue , 2016, bioRxiv.

[20]  Karen L Mohlke,et al.  Deciphering the Emerging Complexities of Molecular Mechanisms at GWAS Loci. , 2018, American journal of human genetics.

[21]  Wei Sun,et al.  A Statistical Framework for eQTL Mapping Using RNA‐seq Data , 2012, Biometrics.

[22]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[23]  Eun Yong Kang,et al.  Identifying Causal Variants at Loci with Multiple Signals of Association , 2014, Genetics.

[24]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[25]  Xin Guan,et al.  Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery , 2020, BMC Bioinformatics.

[26]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[27]  Gregory A. Poland,et al.  Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics , 2015, Genetics.

[28]  D. Sargent,et al.  HaploSNP affinities and linkage map positions illuminate subgenome composition in the octoploid, cultivated strawberry (Fragaria×ananassa). , 2016, Plant science : an international journal of experimental plant biology.

[29]  E. Eskin,et al.  Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies , 2014, PLoS genetics.

[30]  Xiang Zhou,et al.  A scalable Bayesian method for integrating functional information in genome-wide association studies , 2017, bioRxiv.

[31]  A mega-analysis of expression quantitative trait loci (eQTL) provides insight into the regulatory architecture of gene expression variation in liver , 2018, Scientific Reports.

[32]  Michael Inouye,et al.  Power, false discovery rate and Winner’s Curse in eQTL studies , 2017, bioRxiv.

[33]  Matthew Stephens,et al.  Dissecting the regulatory architecture of gene expression QTLs , 2012, Genome Biology.

[34]  Hunter B. Fraser,et al.  High-resolution mapping of cis-regulatory variation in budding yeast , 2017, Proceedings of the National Academy of Sciences.

[35]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[36]  Vladimir I. Vladimirov,et al.  A meta-analysis of gene expression quantitative trait loci in brain , 2014, Translational Psychiatry.

[37]  D. Schaid,et al.  From genome-wide associations to candidate causal variants by statistical fine-mapping , 2018, Nature Reviews Genetics.

[38]  Alan R. Templeton,et al.  Tree Scanning , 2005, Genetics.

[39]  Martin J. Aryee,et al.  Interrogation of human hematopoiesis at single-cell and single-variant resolution , 2018, Nature Genetics.

[40]  Andres Metspalu,et al.  Constraints on eQTL Fine Mapping in the Presence of Multisite Local Regulation of Gene Expression , 2016, G3: Genes, Genomes, Genetics.