Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes

Motivation: There is recent interest in using gene expression data to contextualize findings from traditional genome‐wide association studies (GWAS). Conditioned on a tissue, expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated with genetic variants. eQTLs and eGenes provide great supporting evidence for GWAS hits and important insights into the regulatory pathways involved in many diseases. When a significant variant or a candidate gene identified by GWAS is also an eQTL or eGene, there is strong evidence to further study this variant or gene. Multi‐tissue gene expression datasets like the Gene Tissue Expression (GTEx) data are used to find eQTLs and eGenes. Unfortunately, these datasets often have small sample sizes in some tissues. For this reason, there have been many meta‐analysis methods designed to combine gene expression data across many tissues to increase power for finding eQTLs and eGenes. However, these existing techniques are not scalable to datasets containing many tissues, like the GTEx data. Furthermore, these methods ignore a biological insight that the same variant may be associated with the same gene across similar tissues. Results: We introduce a meta‐analysis model that addresses these problems in existing methods. We focus on the problem of finding eGenes in gene expression data from many tissues, and show that our model is better than other types of meta‐analyses. Availability and Implementation: Source code is at https://github.com/datduong/RECOV. Contact: eeskin@cs.ucla.edu or datdb@cs.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Fang Zhang,et al.  Cis-eQTLs regulate reduced LST1 gene and NCR3 gene expression and contribute to increased autoimmune disease risk , 2016, Proceedings of the National Academy of Sciences.

[2]  Xihong Lin,et al.  JOINT ANALYSIS OF SNP AND GENE EXPRESSION DATA IN GENETIC ASSOCIATION STUDIES OF COMPLEX DISEASES. , 2014, The annals of applied statistics.

[3]  W. Timens,et al.  Combining genomewide association study and lung eQTL analysis provides evidence for novel genes associated with asthma , 2016, Allergy.

[4]  Eleazar Eskin,et al.  Interpreting Meta-Analyses of Genome-Wide Association Studies , 2012, PLoS genetics.

[5]  Jong Wha J. Joo,et al.  Multiple testing correction in linear mixed models , 2016, Genome Biology.

[6]  F. W. Albert,et al.  Brains, genes and power , 2016, Nature Neuroscience.

[7]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[8]  Joel Dudley,et al.  Gene Expression Elucidates Functional Impact of Polygenic Risk for Schizophrenia , 2016 .

[9]  Chun Jimmie Ye,et al.  Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches , 2013, PLoS genetics.

[10]  J. Ledolter,et al.  Introduction to Regression Modeling , 2005 .

[11]  Eleazar Eskin,et al.  Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. , 2011, American journal of human genetics.

[12]  Eleazar Eskin,et al.  Identification of causal genes for complex traits , 2015, Bioinform..

[13]  Eleazar Eskin,et al.  A general framework for meta-analyzing dependent studies with overlapping subjects in association mapping. , 2013, Human molecular genetics.

[14]  M. Stephens,et al.  A Statistical Framework for Joint eQTL Analysis in Multiple Tissues , 2012, PLoS genetics.

[15]  Eleazar Eskin,et al.  Using genomic annotations increases statistical power to detect eGenes , 2016, Bioinform..

[16]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.

[17]  Jong Wha J. Joo,et al.  Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice , 2014, PLoS genetics.

[18]  S. Sharp,et al.  Explaining heterogeneity in meta-analysis: a comparison of methods. , 1997, Statistics in medicine.

[19]  Kouros Owzar,et al.  Exploiting expression patterns across multiple tissues to map expression quantitative trait loci , 2016, BMC Bioinformatics.

[20]  Eleazar Eskin,et al.  Incorporating prior information into association studies , 2012, Bioinform..

[21]  Eleazar Eskin,et al.  Accurate and fast multiple-testing correction in eQTL studies. , 2015, American journal of human genetics.

[22]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[23]  S. Sharp,et al.  Explaining heterogeneity in meta-analysis: a comparison of methods. , 1999 .

[24]  Eleazar Eskin,et al.  Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies , 2014, Genome Biology.

[25]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[26]  Eleazar Eskin,et al.  Discovering genes involved in disease and the mystery of missing heritability , 2015, Commun. ACM.