Detecting regulatory gene-environment interactions with unmeasured environmental factors

MOTIVATION Genomic studies have revealed a substantial heritable component of the transcriptional state of the cell. To fully understand the genetic regulation of gene expression variability, it is important to study the effect of genotype in the context of external factors such as alternative environmental conditions. In model systems, explicit environmental perturbations have been considered for this purpose, allowing to directly test for environment-specific genetic effects. However, such experiments are limited to species that can be profiled in controlled environments, hampering their use in important systems such as human. Moreover, even in seemingly tightly regulated experimental conditions, subtle environmental perturbations cannot be ruled out, and hence unknown environmental influences are frequent. Here, we propose a model-based approach to simultaneously infer unmeasured environmental factors from gene expression profiles and use them in genetic analyses, identifying environment-specific associations between polymorphic loci and individual gene expression traits. RESULTS In extensive simulation studies, we show that our method is able to accurately reconstruct environmental factors and their interactions with genotype in a variety of settings. We further illustrate the use of our model in a real-world dataset in which one environmental factor has been explicitly experimentally controlled. Our method is able to accurately reconstruct the true underlying environmental factor even if it is not given as an input, allowing to detect genuine genotype-environment interactions. In addition to the known environmental factor, we find unmeasured factors involved in novel genotype-environment interactions. Our results suggest that interactions with both known and unknown environmental factors significantly contribute to gene expression variability. AVAILABILITY and implementation: Software available at http://pmbio.github.io/envGPLVM/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[2]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[3]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[4]  W. Huber,et al.  Model-based variance-stabilizing transformation for Illumina microarray data , 2008, Nucleic acids research.

[5]  T. Höfken,et al.  The Rho GDI Rdi1 regulates Rho GTPases by distinct mechanisms. , 2008, Molecular biology of the cell.

[6]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[7]  Leopold Parts,et al.  A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies , 2010, PLoS Comput. Biol..

[8]  David Heckerman,et al.  Correction for hidden confounders in the genetic analysis of gene expression , 2010, Proceedings of the National Academy of Sciences.

[9]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[10]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[11]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[12]  Amnon Horovitz,et al.  ATP-induced allostery in the eukaryotic chaperonin CCT is abolished by the mutation G345D in CCT4 that renders yeast temperature-sensitive for growth. , 2008, Journal of molecular biology.

[13]  Richard E. Baker,et al.  Scm3, an essential Saccharomyces cerevisiae centromere protein required for G2/M progression and Cse4 localization , 2007, Proceedings of the National Academy of Sciences.

[14]  Bo-Juen Chen,et al.  Modularity and interactions in the genetics of gene expression , 2009, Proceedings of the National Academy of Sciences.

[15]  G. Gibson The environmental contribution to gene expression profiles , 2008, Nature Reviews Genetics.

[16]  Debbie S. Yuster,et al.  A complete classification of epistatic two-locus models , 2006, BMC Genetics.

[17]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  C. Boonchird,et al.  Genome-wide identification of genes involved in tolerance to various environmental stresses inSaccharomyces cerevisiae , 2010, Journal of Applied Genetics.

[19]  L. Kruglyak,et al.  Gene–Environment Interaction in Yeast Gene Expression , 2008, PLoS biology.

[20]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[21]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[22]  Neil D. Lawrence,et al.  Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies , 2012, PLoS Comput. Biol..

[23]  Simon C. Potter,et al.  The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study , 2011, PLoS genetics.

[24]  Vipin T. Sreedharan,et al.  Multiple reference genomes and transcriptomes for Arabidopsis thaliana , 2011, Nature.

[25]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[26]  Chun Jimmie Ye,et al.  Accurate Discovery of Expression Quantitative Trait Loci Under Confounding From Spurious and Genuine Regulatory Hotspots , 2008, Genetics.

[27]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[28]  L. B. Snoek,et al.  Genome-wide gene expression regulation as a function of genotype and age in C. elegans. , 2010, Genome research.

[29]  M. Tyers,et al.  Cdc53 is a scaffold protein for multiple Cdc34/Skp1/F-box proteincomplexes that regulate cell division and methionine biosynthesis in yeast. , 1998, Genes & development.

[30]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[31]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[32]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[33]  R. Doerge,et al.  Global eQTL Mapping Reveals the Complex Genetic Architecture of Transcript-Level Variation in Arabidopsis , 2007, Genetics.

[34]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[35]  D. Koller,et al.  Population genomics of human gene expression , 2007, Nature Genetics.

[36]  Greg Gibson,et al.  Using Blood Informative Transcripts in Geographical Genomics: Impact of Lifestyle on Gene Expression in Fijians , 2012, Front. Gene..

[37]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[38]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[39]  Simon C. Potter,et al.  Mapping cis- and trans-regulatory effects across multiple tissues in twins , 2012, Nature Genetics.

[40]  R. Ophoff,et al.  Unraveling the Regulatory Mechanisms Underlying Tissue-Dependent Genetic Variation of Gene Expression , 2012, PLoS genetics.