Accessed Terms of Use

Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein-protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.

[1]  N. Friedman,et al.  Structure and function of a transcriptional network activated by the MAPK Hog1 , 2008, Nature Genetics.

[2]  Amy K. Schmid,et al.  A Predictive Model for Transcriptional Control of Physiology in a Free Living Cell , 2007, Cell.

[3]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[4]  Alexander J. Stewart,et al.  Degree dependence in rates of transcription factor evolution explains the unusual structure of transcription networks , 2009, Proceedings of the Royal Society B: Biological Sciences.

[5]  E. Segal,et al.  transcription regulation Incorporating nucleosomes into thermodynamic models of , 2009 .

[6]  E. Furlong,et al.  Combinatorial binding predicts spatio-temporal cis-regulatory activity , 2009, Nature.

[7]  D. W. Knowles,et al.  Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm , 2008, PLoS biology.

[8]  Steven M. Gallo,et al.  REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila , 2007, Nucleic Acids Res..

[9]  G. Rubin,et al.  Global analysis of patterns of gene expression during Drosophila embryogenesis , 2007, Genome Biology.

[10]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[11]  Mehmet M. Dalkilic,et al.  Gene networks in Drosophila melanogaster: integrating experimental data to predict gene function , 2009, Genome Biology.

[12]  D. Zack,et al.  Analysis of regulatory network topology reveals functionally distinct classes of microRNAs , 2008, Nucleic acids research.

[13]  Melanie A. Huntley,et al.  Evolution of genes and genomes on the Drosophila phylogeny , 2007, Nature.

[14]  A. van Oudenaarden,et al.  MicroRNA-mediated feedback and feedforward loops are recurrent network motifs in mammals. , 2007, Molecular cell.

[15]  Olga G. Troyanskaya,et al.  Global Prediction of Tissue-Specific Gene Expression and Context-Dependent Gene Networks in Caenorhabditis elegans , 2009, PLoS Comput. Biol..

[16]  Peter J. Bickel,et al.  The Developmental Transcriptome of Drosophila melanogaster , 2010, Nature.

[17]  Olga G. Troyanskaya,et al.  Simultaneous Genome-Wide Inference of Physical, Genetic, Regulatory, and Functional Pathway Components , 2010, PLoS Comput. Biol..

[18]  Kojima Structure and function , 2005 .

[19]  Albert-László Barabási,et al.  Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  U. Alon Network motifs: theory and experimental approaches , 2007, Nature Reviews Genetics.

[21]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[22]  A. Tanay,et al.  Three-Dimensional Folding and Functional Organization Principles of the Drosophila Genome , 2012, Cell.

[23]  E. Furlong,et al.  Challenges for modeling global gene regulatory networks during development: insights from Drosophila. , 2010, Developmental biology.

[24]  Farshad Fotouhi,et al.  A database and tool, IM Browser, for exploring and integrating emerging gene and protein interaction data for Drosophila , 2006, BMC Bioinformatics.

[25]  Xiang-Jun Lu,et al.  Inferring Condition-Specific Modulation of Transcription Factor Activity in Yeast through Regulon-Based Analysis of Genomewide Expression , 2008, PloS one.

[26]  Eran Segal,et al.  Incorporating Nucleosomes into Thermodynamic Models of Transcription Regulation , 2009, RECOMB.

[27]  Christian A. Grove,et al.  A compendium of Caenorhabditis elegans regulatory transcription factors: a resource for mapping transcription regulatory networks , 2005, Genome Biology.

[28]  John K. Lyon,et al.  What is a database , 1973, SGMD.

[29]  B. Frey,et al.  The functional landscape of mouse gene expression , 2004, Journal of biology.

[30]  Terran Lane,et al.  Learning structurally consistent undirected probabilistic graphical models , 2009, ICML '09.

[31]  Yitzhak Pilpel,et al.  Global and Local Architecture of the Mammalian microRNA–Transcription Factor Regulatory Network , 2007, PLoS Comput. Biol..

[32]  R. Milo,et al.  Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[34]  Barrett C. Foat,et al.  Predictive modeling of genome-wide mRNA expression: from modules to molecules. , 2007, Annual review of biophysics and biomolecular structure.

[35]  Li Yang,et al.  The transcriptional diversity of 25 Drosophila cell lines. , 2011, Genome research.

[36]  B. Graveley The developmental transcriptome of Drosophila melanogaster , 2010, Nature.

[37]  E. Davidson,et al.  The evolution of hierarchical gene regulatory networks , 2009, Nature Reviews Genetics.

[38]  Colin N. Dewey,et al.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures , 2007, Nature.

[39]  Francesco Falciani,et al.  A computational framework for gene regulatory network inference that combines multiple methods and datasets , 2011, BMC Systems Biology.

[40]  Robert L. Grossman,et al.  A cis-regulatory map of the Drosophila genome , 2011, Nature.

[41]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[42]  Foster Provost,et al.  Suspicion scoring of networked entities based on guilt-by-association, collective inference, and focused data access 1 , 2005 .

[43]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[44]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[45]  Manolis Kellis,et al.  Discovery and characterization of chromatin states for systematic annotation of the human genome , 2010, Nature Biotechnology.

[46]  Lovelace J. Luquette,et al.  Comprehensive analysis of the chromatin landscape in Drosophila , 2010, Nature.

[47]  Manolis Kellis,et al.  Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. , 2007, Genes & development.

[48]  Manolis Kellis,et al.  Reliable prediction of regulator targets using 12 Drosophila genomes. , 2007, Genome research.

[49]  E. Wingender,et al.  Topology of mammalian transcription networks. , 2005, Genome informatics. International Conference on Genome Informatics.

[50]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[51]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[52]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[53]  Foster Provost,et al.  Suspicion scoring based on guilt-by-association, colle ctive inference, and focused data access 1 , 2005 .

[54]  Jari Saramäki,et al.  Characterizing the Community Structure of Complex Networks , 2010, PloS one.

[55]  James B. Brown,et al.  Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions , 2009, Genome Biology.

[56]  Sarah A. Teichmann,et al.  DBD: a transcription factor prediction database , 2005, Nucleic Acids Res..

[57]  Z. N. Oltvai,et al.  Topological units of environmental signal processing in the transcriptional regulatory network of Escherichia coli , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Junhee Seok,et al.  Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships , 2010, BMC Bioinformatics.

[59]  Ziv Bar-Joseph,et al.  A Semi-Supervised Method for Predicting Transcription Factor–Gene Interactions in Escherichia coli , 2008, PLoS Comput. Biol..

[60]  P. Bourgine,et al.  Topological and causal structure of the yeast transcriptional regulatory network , 2002, Nature Genetics.

[61]  H. Kowarzyk Structure and Function. , 1910, Nature.

[62]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[63]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.

[64]  Mark Gerstein,et al.  Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data , 2003, Bioinform..

[65]  Julio Collado-Vides,et al.  RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation , 2007, Nucleic Acids Res..

[66]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[67]  P. Bork,et al.  Identification of tightly regulated groups of genes during Drosophila melanogaster embryogenesis , 2007, Molecular systems biology.

[68]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[69]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[70]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[71]  R. Levine,et al.  Remodeling of the insect nervous system , 1995, Current Opinion in Neurobiology.

[72]  A. Boulesteix,et al.  Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach , 2005, Theoretical Biology and Medical Modelling.