Gene networks in Drosophila melanogaster: integrating experimental data to predict gene function

BackgroundDiscovering the functions of all genes is a central goal of contemporary biomedical research. Despite considerable effort, we are still far from achieving this goal in any metazoan organism. Collectively, the growing body of high-throughput functional genomics data provides evidence of gene function, but remains difficult to interpret.ResultsWe constructed the first network of functional relationships for Drosophila melanogaster by integrating most of the available, comprehensive sets of genetic interaction, protein-protein interaction, and microarray expression data. The complete integrated network covers 85% of the currently known genes, which we refined to a high confidence network that includes 20,000 functional relationships among 5,021 genes. An analysis of the network revealed a remarkable concordance with prior knowledge. Using the network, we were able to infer a set of high-confidence Gene Ontology biological process annotations on 483 of the roughly 5,000 previously unannotated genes. We also show that this approach is a means of inferring annotations on a class of genes that cannot be annotated based solely on sequence similarity. Lastly, we demonstrate the utility of the network through reanalyzing gene expression data to both discover clusters of coregulated genes and compile a list of candidate genes related to specific biological processes.ConclusionsHere we present the the first genome-wide functional gene network in D. melanogaster. The network enables the exploration, mining, and reanalysis of experimental data, as well as the interpretation of new data. The inferred annotations provide testable hypotheses of previously uncharacterized genes.

[1]  Mehmet M. Dalkilic,et al.  Data Pushing: A Fly-Centric Guide to Bioinformatics Tools , 2008, Fly.

[2]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[3]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[4]  S. Wasserman,et al.  Mutation of TweedleD, a member of an unconventional cuticle protein family, alters body shape in Drosophila , 2006, Proceedings of the National Academy of Sciences.

[5]  Justen Andrews,et al.  Paucity of Genes on the Drosophila X Chromosome Showing Male-Biased Expression , 2003, Science.

[6]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[7]  Steven M. Gallo,et al.  REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila , 2007, Nucleic Acids Res..

[8]  M. McLeod,et al.  Cpc2/RACK1 Is a Ribosome-associated Protein That Promotes Efficient Translation in Schizosaccharomyces pombe* , 2003, Journal of Biological Chemistry.

[9]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[10]  Yves Moreau,et al.  Integrating Computational Biology and Forward Genetics in Drosophila , 2009, PLoS genetics.

[11]  Michael P. Eichenlaub,et al.  A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development. , 2006, Developmental cell.

[12]  David Osumi-Sutherland,et al.  FlyBase: enhancing Drosophila Gene Ontology annotations , 2008, Nucleic Acids Res..

[13]  A. Link,et al.  Yeast Asc1p and Mammalian RACK1 Are Functionally Orthologous Core 40S Ribosomal Proteins That Repress Gene Expression , 2004, Molecular and Cellular Biology.

[14]  Ting Chen,et al.  Assessment of the reliability of protein-protein interactions and protein function prediction , 2002, Pacific Symposium on Biocomputing.

[15]  A. Fraser,et al.  A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans , 2008, Nature Genetics.

[16]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[17]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[18]  C. Lehner,et al.  Incorporation of Drosophila CID/CENP-A and CENP-C into Centromeres during Early Embryonic Anaphase , 2007, Current Biology.

[19]  J. Hoheisel,et al.  Expression profiling of glial genes during Drosophila embryogenesis. , 2006, Developmental biology.

[20]  T. Barrette,et al.  ONCOMINE: a cancer microarray database and integrated data-mining platform. , 2004, Neoplasia.

[21]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[22]  Melanie A. Huntley,et al.  Evolution of genes and genomes on the Drosophila phylogeny , 2007, Nature.

[23]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[24]  S. Blair,et al.  The crossveinless gene encodes a new member of the Twisted gastrulation family of BMP-binding proteins which, with Short gastrulation, promotes BMP signaling in the crossveins of the Drosophila wing. , 2005, Developmental biology.

[25]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[26]  G. Rubin,et al.  Global analyses of mRNA translational control during early Drosophila embryogenesis , 2007, Genome Biology.

[27]  Michael I. Jordan,et al.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence , 2008, Genome Biology.

[28]  G M Rubin,et al.  A brief history of Drosophila's contributions to genome research. , 2000, Science.

[29]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[30]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[31]  M. Gerstein,et al.  Getting connected: analysis and principles of biological networks. , 2007, Genes & development.

[32]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[33]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[34]  P. Bork,et al.  Identification of tightly regulated groups of genes during Drosophila melanogaster embryogenesis , 2007, Molecular systems biology.

[35]  W. Kim,et al.  Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy , 2008, Genome Biology.

[36]  M. Beckerle,et al.  Characterization of RACK1 function in Drosophila development , 2007, Developmental dynamics : an official publication of the American Association of Anatomists.

[37]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[38]  T. Magalhães,et al.  Transcriptional control in embryonic Drosophila midline guidance assessed through a whole genome approach , 2007, BMC Neuroscience.

[39]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[40]  Matthew A. Hibbs,et al.  Discovery of biological networks from diverse functional genomic data , 2005, Genome Biology.

[41]  T. Joshi,et al.  Genome-scale gene function prediction using multiple sources of high-throughput data in yeast Saccharomyces cerevisiae. , 2004, Omics : a journal of integrative biology.

[42]  B. Dickson,et al.  A genome-wide transgenic RNAi library for conditional gene inactivation in Drosophila , 2007, Nature.

[43]  Simon Kasif,et al.  Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data , 2007, PloS one.

[44]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[45]  Matthias E. Futschik,et al.  OLIN: optimized normalization, visualization and quality testing of two-channel microarray data , 2005, Bioinform..

[46]  A. Teleman,et al.  Nutritional control of protein biosynthetic capacity by insulin via Myc in Drosophila. , 2008, Cell metabolism.

[47]  Frank Holstege,et al.  Predicting gene function through systematic analysis and quality assessment of high-throughput data , 2005, Bioinform..

[48]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[49]  Colin N. Dewey,et al.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures , 2007, Nature.

[50]  B. S. Baker,et al.  Gene Expression During the Life Cycle of Drosophila melanogaster , 2002, Science.

[51]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[52]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Mogens Kruhøffer,et al.  Full genome gene expression analysis of the heat stress response in Drosophila melanogaster , 2005, Cell stress & chaperones.

[54]  B. Palsson,et al.  The model organism as a system: integrating 'omics' data sets , 2006, Nature Reviews Molecular Cell Biology.

[55]  Kevin P White,et al.  Tissue-specific gene expression and ecdysone-regulated genomic networks in Drosophila. , 2003, Developmental cell.

[56]  G. Rubin,et al.  The BDGP Gene Disruption Project , 2004, Genetics.

[57]  T. Mackay,et al.  Quantitative Genomics of Aggressive Behavior in Drosophila melanogaster , 2006, PLoS genetics.

[58]  Bernardo A Mangiola,et al.  A Drosophila protein-interaction map centered on cell-cycle regulators , 2004, Genome Biology.

[59]  Marc Vidal,et al.  Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis , 2005, Nature.

[60]  Marc S Halfon,et al.  An Integrated Strategy for Analyzing the Unique Developmental Programs of Different Myoblast Subtypes , 2006, PLoS genetics.

[61]  G. Rubin,et al.  The Toll and Imd pathways are the major regulators of the immune response in Drosophila , 2002, The EMBO journal.

[62]  J. Frank,et al.  Identification of the versatile scaffold protein RACK1 on the eukaryotic ribosome by cryo-EM , 2004, Nature Structural &Molecular Biology.

[63]  Gerald M Rubin,et al.  Evidence for large domains of similarly expressed genes in the Drosophila genome , 2002, Journal of biology.

[64]  S. L. Wong,et al.  Combining biological networks to predict genetic interactions. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[66]  Casey M. Bergman,et al.  Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster , 2005, Bioinform..

[67]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Julie M. Sullivan,et al.  FlyMine: an integrated database for Drosophila and Anopheles genomics , 2007, Genome Biology.

[69]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[70]  Linda Partridge,et al.  Genome-wide gene expression in response to parasitoid attack in Drosophila , 2005, Genome Biology.

[71]  Matthew A. Hibbs,et al.  Exploring the human genome with functional maps. , 2009, Genome research.

[72]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[73]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[74]  M Vingron,et al.  An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome , 2003, Genome Biology.

[75]  G. Orphanides,et al.  Supplemental Figures and Tables , 2020 .

[76]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[77]  Kui Zhang,et al.  Prediction of protein function using protein-protein interaction data , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[78]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[79]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[80]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[81]  Mahesan Niranjan,et al.  Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster , 2007, PLoS Comput. Biol..

[82]  Kathleen Marchal,et al.  Integration of omics data: how well does it work for bacteria? , 2006, Molecular microbiology.

[83]  K. Gunsalus,et al.  Network modeling links breast cancer susceptibility and centrosome dysfunction. , 2007, Nature genetics.

[84]  Deborah J. Andrew,et al.  CrebA regulates secretory activity in the Drosophila salivary gland and epidermis , 2005, Development.

[85]  Giorgio Valle,et al.  The Gene Ontology project in 2008 , 2007, Nucleic Acids Res..

[86]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[87]  C. Thummel,et al.  The genomic response to 20-hydroxyecdysone at the onset of Drosophila metamorphosis , 2005, Genome Biology.

[88]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[89]  Norbert Perrimon,et al.  FlyRNAi: the Drosophila RNAi screening center database , 2005, Nucleic Acids Res..

[90]  Yuanfang Guan,et al.  A Genomewide Functional Network for the Laboratory Mouse , 2008, PLoS Comput. Biol..

[91]  E. Marcotte,et al.  An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae , 2007, PloS one.

[92]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[93]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[94]  Markus J. Herrgård,et al.  Integrating high-throughput and computational data elucidates bacterial networks , 2004, Nature.

[95]  T. Hughes,et al.  Why Are There Still Over 1000 Uncharacterized Yeast Genes? , 2007, Genetics.

[96]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[97]  J. Dow,et al.  Using FlyAtlas to identify better Drosophila melanogaster models of human disease , 2007, Nature Genetics.

[98]  P. Tomançak,et al.  Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function , 2007, Cell.

[99]  W. Gelbart,et al.  Research resources for Drosophila: the expanding universe , 2005, Nature Reviews Genetics.

[100]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[101]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[102]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[103]  A. Fraser,et al.  A probabilistic view of gene function , 2004, Nature Genetics.

[104]  B. Snel,et al.  Comparative genomics for reliable protein-function prediction from genomic data. , 2004, Trends in genetics : TIG.