Learning a Prior on Regulatory Potential from eQTL Data

Genome-wide RNA expression data provide a detailed view of an organism's biological state; hence, a dataset measuring expression variation between genetically diverse individuals (eQTL data) may provide important insights into the genetics of complex traits. However, with data from a relatively small number of individuals, it is difficult to distinguish true causal polymorphisms from the large number of possibilities. The problem is particularly challenging in populations with significant linkage disequilibrium, where traits are often linked to large chromosomal regions containing many genes. Here, we present a novel method, Lirnet, that automatically learns a regulatory potential for each sequence polymorphism, estimating how likely it is to have a significant effect on gene expression. This regulatory potential is defined in terms of “regulatory features”—including the function of the gene and the conservation, type, and position of genetic polymorphisms—that are available for any organism. The extent to which the different features influence the regulatory potential is learned automatically, making Lirnet readily applicable to different datasets, organisms, and feature sets. We apply Lirnet both to the human HapMap eQTL dataset and to a yeast eQTL dataset and provide statistical and biological results demonstrating that Lirnet produces significantly better regulatory programs than other recent approaches. We demonstrate in the yeast data that Lirnet can correctly suggest a specific causal sequence variation within a large, linked chromosomal region. In one example, Lirnet uncovered a novel, experimentally validated connection between Puf3—a sequence-specific RNA binding protein—and P-bodies—cytoplasmic structures that regulate translation and RNA stability—as well as the particular causative polymorphism, a SNP in Mkt1, that induces the variation in the pathway.

[1]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[2]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[3]  Reed B. Wickner MKT1, a nonessential Saccharomyces cerevisiae gene with a temperature-dependent effect on replication of M2 double-stranded RNA , 1987, Journal of bacteriology.

[4]  F. Winston,et al.  SPT5, an essential gene important for normal transcription in Saccharomyces cerevisiae, encodes an acidic nuclear protein with a carboxy-terminal repeat , 1991, Molecular and cellular biology.

[5]  Charles Elkan,et al.  The Value of Prior Knowledge in Discovering Motifs with MEME , 1995, ISMB.

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[8]  B Hamilton,et al.  A heterodimer of the Zn2Cys6 transcription factors Pip2p and Oaf1p controls induction of genes encoding peroxisomal proteins in Saccharomyces cerevisiae. , 1997, European journal of biochemistry.

[9]  J. Boeke,et al.  Designer deletion strains derived from Saccharomyces cerevisiae S288C: A useful set of strains and plasmids for PCR‐mediated gene disruption and other applications , 1998, Yeast.

[10]  D. Eide,et al.  Regulation of Zinc Homeostasis in Yeast by Binding of the ZAP1 Transcriptional Activator to Zinc-responsive Promoter Elements* , 1998, The Journal of Biological Chemistry.

[11]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[12]  D. Botstein,et al.  Genome-wide characterization of the Zap1p zinc-responsive regulon in yeast. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  Me31B silences translation of oocyte-localizing RNAs through the formation of cytoplasmic RNP complex during Drosophila oogenesis. , 2001, Development.

[16]  R. Parker,et al.  The DEAD box helicase, Dhh1p, functions in mRNA decapping and interacts with both the decapping and deadenylase complexes. , 2001, RNA.

[17]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[18]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[19]  K. Weis,et al.  The DEAD box protein Dhh1 stimulates the decapping enzyme Dcp1 , 2002, The EMBO journal.

[20]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[21]  John Aach,et al.  Measuring absolute expression with microarrays with a calibrated reference sample and an extended signal intensity range , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Albert-László Barabási,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002 .

[23]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[24]  Daniel R. Richards,et al.  Dissecting the architecture of a quantitative trait locus in yeast , 2002, Nature.

[25]  Amos Tanay,et al.  Minreg: Inferring an active regulator set , 2002, ISMB.

[26]  Rachel B. Brem,et al.  Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors , 2003, Nature Genetics.

[27]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[28]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[29]  Roy Parker,et al.  Decapping and Decay of Messenger RNA Occur in Cytoplasmic Processing Bodies , 2003 .

[30]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[31]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[32]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[33]  P. Brown,et al.  Extensive Association of Functionally and Cytotopically Related mRNAs with Puf Family RNA-Binding Proteins in Yeast , 2004, PLoS biology.

[34]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[35]  K. Irie,et al.  Posttranscriptional Regulation of HO Expression by the Mkt1-Pbp1 Complex , 2004, Molecular and Cellular Biology.

[36]  K. Thorn,et al.  Optimized cassettes for fluorescent protein tagging in Saccharomyces cerevisiae , 2004, Yeast.

[37]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[38]  Roy Parker,et al.  Movement of Eukaryotic mRNAs Between Polysomes and Cytoplasmic Processing Bodies , 2005, Science.

[39]  Roy Parker,et al.  General Translational Repression by Activators of mRNA Decapping , 2005, Cell.

[40]  Ezgi O. Booth,et al.  Epistasis analysis with global transcriptional phenotypes , 2005, Nature Genetics.

[41]  Yan Cui,et al.  Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information , 2005, Bioinform..

[42]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[43]  Gregory J. Hannon,et al.  MicroRNA-dependent localization of targeted mRNAs to mammalian P-bodies , 2005, Nature Cell Biology.

[44]  Yan Cui,et al.  Inferring gene transcriptional modulatory relations: a genetical genomics approach. , 2005, Human molecular genetics.

[45]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Manjunatha Jagalur,et al.  Causal inference of regulator-target pairs by gene mapping of expression phenotypes , 2005, BMC Genomics.

[47]  Ronald W. Davis,et al.  Quantitative trait loci mapped to single-nucleotide resolution in yeast , 2005, Nature Genetics.

[48]  M. Gerstein,et al.  Global analysis of protein phosphorylation in yeast , 2005, Nature.

[49]  John D. Storey,et al.  Multiple Locus Linkage Analysis of Genomewide Expression in Yeast , 2005, PLoS biology.

[50]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[51]  Sean R. Collins,et al.  Exploration of the Function and Organization of the Yeast Early Secretory Pathway through an Epistatic Miniarray Profile , 2005, Cell.

[52]  N. Bing,et al.  Genetical Genomics Analysis of a Yeast Segregant Population for Transcription Network Inference , 2005, Genetics.

[53]  Li Wang,et al.  An integrative approach for causal gene identification and gene regulatory pathway inference , 2006, ISMB.

[54]  Charles Boone,et al.  Identifying transcription factor functions and targets by phenotypic activation , 2006, Proceedings of the National Academy of Sciences.

[55]  D. Pe’er,et al.  Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification , 2006, Proceedings of the National Academy of Sciences.

[56]  N. Bot,et al.  Fungi and animals may share a common ancestor to nuclear receptors. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[57]  L. Wasserman,et al.  False discovery control with p-value weighting , 2006 .

[58]  Himanshu Sinha,et al.  Complex Genetic Interactions in a Quantitative Trait Locus , 2006, PLoS genetics.

[59]  Larry Wasserman,et al.  Using linkage genome scans to improve power of association in genome scans. , 2006, American journal of human genetics.

[60]  C. Semple,et al.  Chromatin structure and evolution in the human genome , 2007, BMC Evolutionary Biology.

[61]  Roy Parker,et al.  Targeting of Aberrant mRNAs to Cytoplasmic Processing Bodies , 2006, Cell.

[62]  D. Koller,et al.  Population genomics of human gene expression , 2007, Nature Genetics.

[63]  C.-C. Jay Kuo,et al.  Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations. , 2007, American journal of human genetics.

[64]  Yitzhak Pilpel,et al.  Differential translation efficiency of orthologous genes is involved in phenotypic divergence of yeast species , 2007, Nature Genetics.

[65]  P. Bork,et al.  Systematic Discovery of In Vivo Phosphorylation Networks , 2007, Cell.

[66]  Patrick J. Killion,et al.  Genetic reconstruction of a functional transcriptional regulatory network , 2007, Nature Genetics.

[67]  N. Friedman,et al.  Natural history and evolutionary principles of gene duplication in fungi , 2007, Nature.

[68]  Grant W. Brown,et al.  Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map , 2007, Nature.

[69]  Liran Carmel,et al.  Widespread positive selection in synonymous sites of mammalian genes. , 2007, Molecular biology and evolution.

[70]  Daphne Koller,et al.  Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.

[71]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[72]  Stephen A Ramsey,et al.  Transcriptional Responses to Fatty Acid Are Coordinated by Combinatorial Control , 2022 .

[73]  Yonina C. Eldar,et al.  eQED: an efficient method for interpreting eQTL associations using protein networks , 2008, Molecular systems biology.

[74]  Rachel B. Brem,et al.  Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks , 2008, Nature Genetics.

[75]  John D. Storey,et al.  Mapping the Genetic Architecture of Gene Expression in Human Liver , 2008, PLoS biology.

[76]  S. Horvath,et al.  Variations in DNA elucidate molecular networks that cause disease , 2008, Nature.