Exploiting expression patterns across multiple tissues to map expression quantitative trait loci

BackgroundIn order to better understand complex diseases, it is important to understand how genetic variation in the regulatory regions affects gene expression. Genetic variants found in these regulatory regions have been shown to activate transcription in a tissue-specific manner. Therefore, it is important to map the aforementioned expression quantitative trait loci (eQTL) using a statistically disciplined approach that jointly models all the tissues and makes use of all the information available to maximize the power of eQTL mapping. In this context, we are proposing a score test-based approach where we model tissue-specificity as a random effect and investigate an overall shift in the gene expression combined with tissue-specific effects due to genetic variants.ResultsOur approach has 1) a distinct computational edge, and 2) comparable performance in terms of statistical power over other currently existing joint modeling approaches such as MetaTissue eQTL and eQTL-BMA. Using simulations, we show that our method increases the power to detect eQTLs when compared to a tissue-by-tissue approach and can exceed the performance, in terms of computational speed, of MetaTissue eQTL and eQTL-BMA. We apply our method to two publicly available expression datasets from normal human brains, one comprised of four brain regions from 150 neuropathologically normal samples and another comprised of ten brain regions from 134 neuropathologically normal samples, and show that by using our method and jointly analyzing multiple brain regions, we identify eQTLs within more genes when compared to three often used existing methods.ConclusionsSince we employ a score test-based approach, there is no need for parameter estimation under the alternative hypothesis. As a result, model parameters only have to be estimated once per genome, significantly decreasing computation time. Our method also accommodates the analysis of next- generation sequencing data. As an example, by modeling gene transcripts in an analogous fashion to tissues in our current formulation one would be able to test for both a variant overall effect across all isoforms of a gene as well as transcript-specific effects. We implement our approach within the R package JAGUAR, which is now available at the Comprehensive R Archive Network repository.

[1]  A. Singleton,et al.  Genetic variability in the regulation of gene expression in ten regions of the human brain , 2014, Nature Neuroscience.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Xihong Lin,et al.  JOINT ANALYSIS OF SNP AND GENE EXPRESSION DATA IN GENETIC ASSOCIATION STUDIES OF COMPLEX DISEASES. , 2014, The annals of applied statistics.

[4]  Serge Batalov,et al.  Use of a Dense Single Nucleotide Polymorphism Map for In Silico Mapping in the Mouse , 2004, PLoS biology.

[5]  Dajiang J. Liu,et al.  Meta-Analysis of Gene Level Tests for Rare Variant Association , 2013, Nature Genetics.

[6]  Paul J. Harrison The neuropathology of schizophrenia , 2008 .

[7]  Chun Jimmie Ye,et al.  Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches , 2013, PLoS genetics.

[8]  Albert-László Barabási,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002 .

[9]  R. Gottardo,et al.  An Integrated Hierarchical Bayesian Model for Multivariate eQTL Mapping , 2012, Statistical applications in genetics and molecular biology.

[10]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[11]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[12]  Hao Wu,et al.  R/qtl: QTL Mapping in Experimental Crosses , 2003, Bioinform..

[13]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[14]  M. Stephens,et al.  A Statistical Framework for Joint eQTL Analysis in Multiple Tissues , 2012, PLoS genetics.

[15]  F. E. Satterthwaite An approximate distribution of estimates of variance components. , 1946, Biometrics.

[16]  Xihong Lin Variance component testing in generalised linear models with random effects , 1997 .

[17]  Satterthwaite Fe An approximate distribution of estimates of variance components. , 1946 .

[18]  Luigi Ferrucci,et al.  Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain , 2010, PLoS genetics.

[19]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[20]  Pierre Lafaye de Micheaux,et al.  Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods , 2010, Comput. Stat. Data Anal..

[21]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[22]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[23]  Andrew B. Nobel,et al.  FastMap: Fast eQTL mapping in homozygous populations , 2008, Bioinform..

[24]  D. Y. Lin,et al.  An efficient Monte Carlo approach to assessing statistical significance in genomic studies , 2005, Bioinform..

[25]  Paul J. Harrison,et al.  Schizophrenia genes, gene expression, and neuropathology: on the matter of their convergence , 2005, Molecular Psychiatry.

[26]  H. Keselman,et al.  Modern robust data analysis methods: measures of central tendency. , 2003, Psychological methods.

[27]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[28]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[29]  Donald E. Myers,et al.  Linear and Generalized Linear Mixed Models and Their Applications , 2008, Technometrics.

[30]  Ana Ivelisse Avilés,et al.  Linear Mixed Models for Longitudinal Data , 2001, Technometrics.