AVID: An integrative framework for discovering functional relationships among proteins

BackgroundDetermining the functions of uncharacterized proteins is one of the most pressing problems in the post-genomic era. Large scale protein-protein interaction assays, global mRNA expression analyses and systematic protein localization studies provide experimental information that can be used for this purpose. The data from such experiments contain many false positives and false negatives, but can be processed using computational methods to provide reliable information about protein-protein relationships and protein function. An outstanding and important goal is to predict detailed functional annotation for all uncharacterized proteins that is reliable enough to effectively guide experiments.ResultsWe present AVID, a computational method that uses a multi-stage learning framework to integrate experimental results with sequence information, generating networks reflecting functional similarities among proteins. We illustrate use of the networks by making predictions of detailed Gene Ontology (GO) annotations in three categories: molecular function, biological process, and cellular component. Applied to the yeast Saccharomyces cerevisiae, AVID provides 37,451 pair-wise functional linkages between 4,191 proteins. These relationships are ~65–78% accurate, as assessed by cross-validation testing. Assignments of highly detailed functional descriptors to proteins, based on the networks, are estimated to be ~67% accurate for GO categories describing molecular function and cellular component and ~52% accurate for terms describing biological process. The predictions cover 1,490 proteins with no previous annotation in GO and also assign more detailed functions to many proteins annotated only with less descriptive terms. Predictions made by AVID are largely distinct from those made by other methods. Out of 37,451 predicted pair-wise relationships, the greatest number shared in common with another method is 3,413.ConclusionAVID provides three networks reflecting functional associations among proteins. We use these networks to generate new, highly detailed functional predictions for roughly half of the yeast proteome that are reliable enough to drive targeted experimental investigations. The predictions suggest many specific, testable hypotheses. All of the data are available as downloadable files as well as through an interactive website at http://web.mit.edu/biology/keating/AVID. Thus, AVID will be a valuable resource for experimental biologists.

[1]  Song Tan,et al.  Yeast enhancer of polycomb defines global Esa1-dependent acetylation of chromatin. , 2003, Genes & development.

[2]  Brian E Snydsman,et al.  Assigning function to yeast proteins by integration of technologies. , 2003, Molecular cell.

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  Frederick P. Roth,et al.  Predicting co-complexed protein pairs using genomic and proteomic data integration , 2004, BMC Bioinformatics.

[5]  Localization, Localization, Localization , 2004 .

[6]  Michael Lappe,et al.  From gene networks to gene function. , 2003, Genome research.

[7]  P. Kemmeren,et al.  Protein interaction verification and functional annotation by integrated analysis of genome-scale data. , 2002, Molecular cell.

[8]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[9]  B. Snel,et al.  Function prediction and protein networks. , 2003, Current opinion in cell biology.

[10]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[11]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[12]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[13]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Jason Weston,et al.  Learning Gene Functional Classifications from Multiple Data Types , 2002, J. Comput. Biol..

[15]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[16]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Carl Wu,et al.  Involvement of actin-related proteins in ATP-dependent chromatin remodeling. , 2003, Molecular cell.

[20]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[23]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[24]  T. Hughes,et al.  Organization and Function of APT, a Subcomplex of the Yeast Cleavage and Polyadenylation Factor Involved in the Formation of mRNA and Small Nucleolar RNA 3′-Ends* , 2003, Journal of Biological Chemistry.

[25]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[26]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[27]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[28]  D. Eisenberg,et al.  Computational methods of analysis of protein-protein interactions. , 2003, Current opinion in structural biology.

[29]  M. Gerstein,et al.  Integration of genomic datasets to predict protein complexes in yeast , 2004, Journal of Structural and Functional Genomics.

[30]  B. Snel,et al.  STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. , 2000, Nucleic acids research.

[31]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[32]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[34]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.