24 Bioinformatic Prediction of Yeast Gene Function

Publisher Summary A wide variety of approaches have been developed to predict gene function, ranging from sequence analyses for assigning genes into functional families to structural analyses for assigning protein folds and active sites and to phylogenetic analyses for subdividing gene families into functional subgroups or predicting interacting partners. Because gene function takes such a wide variety of forms, from the corresponding protein's biochemical activity to its physical interaction partners to membership in a given pathway, the chapter only discusses the latter network aspects of gene function: a protein's interaction and pathway partners and the inferences of the function that derive from these. One of the most effective strategies for inferring pathway-type functional information has turned out to be the general strategy of guilt by association. The chapter discusses the inference of yeast gene function via guilt-by-association approaches, along with illustrating a variety of relevant functional and comparative genomics approaches and their integration to predict gene function more accurately. It describes the way these approaches can be made quantitative by estimating the error rates in data and in the predicted gene functions.

[1]  Amanda Clare,et al.  Predicting gene function in Saccharomyces cerevisiae , 2003, ECCB.

[2]  David Botstein,et al.  The Stanford Microarray Database: data access and quality assessment tools , 2003, Nucleic Acids Res..

[3]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[4]  Ting Chen,et al.  Mapping gene ontology to proteins based on protein-protein interaction data , 2004, Bioinform..

[5]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Gary D Bader,et al.  Systematic Genetic Analysis with Ordered Arrays of Yeast Deletion Mutants , 2001, Science.

[7]  G. Church,et al.  Exploring the DNA-binding specificities of zinc fingers with DNA microarrays , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[9]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[10]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  G Demetriou,et al.  Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[12]  Amy E. Keating,et al.  AVID: An integrative framework for discovering functional relationships among proteins , 2005, BMC Bioinformatics.

[13]  R. Brent,et al.  Correlation of two-hybrid affinity data with in vitro measurements , 1995, Molecular and cellular biology.

[14]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  P. Kemmeren,et al.  Protein interaction verification and functional annotation by integrated analysis of genome-scale data. , 2002, Molecular cell.

[17]  M. Daly,et al.  Guilt by association , 2000, Nature Genetics.

[18]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[19]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[20]  Lani F. Wu,et al.  Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters , 2002, Nature Genetics.

[21]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[22]  M. Gerstein,et al.  Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. , 2004, Current opinion in microbiology.

[23]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[24]  Edward M. Marcotte,et al.  Protein function prediction using the Protein Link EXplorer (PLEX) , 2005, Bioinform..

[25]  Charles DeLisi,et al.  Identifying functional links between genes using conserved chromosomal proximity. , 2002, Trends in genetics : TIG.

[26]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[27]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[28]  Yoshihiro Yamanishi,et al.  Protein network inference from multiple genomic data: a supervised approach , 2004, ISMB/ECCB.

[29]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[30]  D. Baker,et al.  Protein structure prediction in 2002. , 2002, Current opinion in structural biology.

[31]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[32]  Gerhard G. Thallinger,et al.  YPL.db: the Yeast Protein Localization database , 2002, Nucleic Acids Res..

[33]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Charles DeLisi,et al.  Predictome: a database of putative functional links between proteins , 2002, Nucleic Acids Res..

[35]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[36]  Michael Lappe,et al.  From gene networks to gene function. , 2003, Genome research.

[37]  C. Ouzounis,et al.  Automatic extraction of protein interactions from scientific abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[38]  J. Rain,et al.  The Splicing ATPase Prp43p Is a Component of Multiple Preribosomal Particles , 2005, Molecular and Cellular Biology.

[39]  C. DeLisi,et al.  Genes linked by fusion events are generally of the same functional category: A systematic analysis of 30 microbial genomes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[40]  T. Hughes,et al.  The Splicing Factor Prp43p, a DEAH Box ATPase, Functions in Ribosome Biogenesis , 2006, Molecular and Cellular Biology.

[41]  Zhen Liu,et al.  Refined phylogenetic profiles method for predicting protein-protein interactions , 2005, Bioinform..

[42]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[43]  B. Schwer,et al.  Prp43 Is an Essential RNA-dependent ATPase Required for Release of Lariat-Intron from the Spliceosome* , 2002, The Journal of Biological Chemistry.

[44]  J. Skolnick,et al.  Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. , 1998, Journal of molecular biology.

[45]  Mike Tyers,et al.  The GRID: The General Repository for Interaction Datasets , 2003, Genome Biology.

[46]  E. Sonnhammer,et al.  Large‐scale prediction of function shift in protein families with a focus on enzymatic function , 2005, Proteins.

[47]  Ting Chen,et al.  An Integrated Probabilistic Model for Functional Prediction of Proteins , 2004, J. Comput. Biol..

[48]  S. Fields,et al.  The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[49]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[50]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[51]  J. Rothberg,et al.  Gaining confidence in high-throughput protein interaction networks , 2004, Nature Biotechnology.

[52]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.

[53]  D. Eisenberg,et al.  Protein function in the post-genomic era , 2000, Nature.

[54]  Ioannis Xenarios,et al.  Mining literature for protein-protein interactions , 2001, Bioinform..

[55]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[56]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[57]  S. L. Wong,et al.  Combining biological networks to predict genetic interactions. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[58]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[60]  L. Aravind Guilt by association: contextual information in genome analysis. , 2000, Genome research.

[61]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[62]  O. Lichtarge,et al.  Structural clusters of evolutionary trace residues are statistically significant and common in proteins. , 2002, Journal of molecular biology.

[63]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[64]  Chris P. Ponting,et al.  Issues in Predicting Protein Function From Sequence , 2001, Briefings Bioinform..

[65]  Julio Collado-Vides,et al.  RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12 , 2004, Nucleic Acids Res..

[66]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[67]  Arun K. Ramani,et al.  Protein interaction networks from yeast to human. , 2004, Current opinion in structural biology.

[68]  Simon Kasif,et al.  Identification of functional links between genes using phylogenetic profiles , 2003, Bioinform..

[69]  J. Risler,et al.  Identification of genomic features using microsyntenies of domains: domain teams. , 2005, Genome research.

[70]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[71]  S. Oliver Proteomics: Guilt-by-association goes global , 2000, Nature.

[72]  B. Honig Protein folding: from the levinthal paradox to structure prediction. , 1999, Journal of molecular biology.

[73]  T. Ideker,et al.  Systematic interpretation of genetic interactions using protein networks , 2005, Nature Biotechnology.

[74]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[75]  Adam Godzik,et al.  Fold recognition methods. , 2005, Methods of biochemical analysis.

[76]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[77]  R. Young,et al.  Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays , 2004, Nature Genetics.

[78]  Mark Gerstein,et al.  Analyzing cellular biochemistry in terms of molecular networks. , 2003, Annual review of biochemistry.

[79]  Jason Weston,et al.  Learning Gene Functional Classifications from Multiple Data Types , 2002, J. Comput. Biol..

[80]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[81]  M. Ares,et al.  Prp43p Is a DEAH-Box Spliceosome Disassembly Factor Essential for Ribosome Biogenesis , 2006, Molecular and Cellular Biology.

[82]  B. Snel,et al.  The identification of functional modules from the genomic association of genes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Michael I. Jordan,et al.  Protein Molecular Function Prediction by Bayesian Phylogenomics , 2005, PLoS Comput. Biol..

[84]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[85]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[86]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[87]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[88]  H. Herzel,et al.  Is there a bias in proteome research? , 2001, Genome research.

[89]  Gary D Bader,et al.  Global Mapping of the Yeast Genetic Interaction Network , 2004, Science.

[90]  A. Valencia,et al.  Conserved Clusters of Functionally Related Genes in Two Bacterial Genomes , 1997, Journal of Molecular Evolution.

[91]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[92]  Kei-Hoi Cheung,et al.  The TRIPLES database: a community resource for yeast molecular biology , 2002, Nucleic Acids Res..

[93]  Denys Proux,et al.  A Pragmatic Information Extraction Strategy for Gathering Data on Genetic Interactions , 2000, ISMB.

[94]  A. Valencia,et al.  In silico two‐hybrid system for the selection of physically interacting protein pairs , 2002, Proteins.

[95]  S. Fields,et al.  Elimination of false positives that arise in using the two-hybrid system. , 1993, BioTechniques.

[96]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[97]  C. Guthrie,et al.  Mechanical Devices of the Spliceosome: Motors, Clocks, Springs, and Things , 1998, Cell.

[98]  J. Jung,et al.  Protein structure prediction. , 2001, Current opinion in chemical biology.

[99]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[100]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[101]  Edward M Marcotte,et al.  Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages , 2003, Nature Biotechnology.

[102]  J A Eisen,et al.  A phylogenomic study of the MutS family of proteins. , 1998, Nucleic acids research.

[103]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[104]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[105]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[106]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[107]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[108]  J. Eisen,et al.  Phylogenetic analysis and gene functional predictions: phylogenomics in action. , 2002, Theoretical population biology.

[109]  Frederick P. Roth,et al.  Predicting co-complexed protein pairs using genomic and proteomic data integration , 2004, BMC Bioinformatics.

[110]  B. Snel,et al.  Function prediction and protein networks. , 2003, Current opinion in cell biology.

[111]  Jean-Philippe Vert,et al.  A tree kernel to analyse phylogenetic profiles , 2002, ISMB.

[112]  Haruki Nakamura,et al.  Filtering high-throughput protein-protein interaction data using a combination of genomic features , 2005, BMC Bioinformatics.