Integrative Protein Function Transfer Using Factor Graphs and Heterogeneous Data Sources

We propose a novel approach for predicting protein functions of an organism by coupling sequence homology and PPI data between two (or more) species with multi-functional Gene Ontology information into a single computational model. Instead of using a network of one organism in isolation, we join networks of different species by inter-species sequence homology links of sufficient similarity. As a consequence, the knowledge of a protein's function is acquired not only from one species' network alone, but also through homologous links to the networks of different species. We apply our method to two largest protein networks, Yeast (Saccharomyces cerevisiae) and Fly (Drosophila melanogaster). Our joint Fly-Yeast network displays statistically significant improvements in precision, accuracy, and false positive rate over networks that consider either of the sources in isolation, while retaining the computational efficiency of the simpler models.

[1]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[2]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[3]  Rolf Apweiler,et al.  The Proteome Analysis database: a tool for the in silico analysis of whole proteomes , 2003, Nucleic Acids Res..

[4]  Ting Chen,et al.  An Integrated Probabilistic Model for Functional Prediction of Proteins , 2004, J. Comput. Biol..

[5]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[6]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[7]  Michael Y. Galperin,et al.  Who's your neighbor? New computational approaches for functional genomics , 2000, Nature Biotechnology.

[8]  B. Rost,et al.  Comparing function and structure between entire proteomes , 2001, Protein science : a publication of the Protein Society.

[9]  Mike Tyers,et al.  The GRID: The General Repository for Interaction Datasets , 2003, Genome Biology.

[10]  Simon Kasif,et al.  Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data , 2007, PloS one.

[11]  Edmund M. Clarke,et al.  Model Checking , 1999, Handbook of Automated Reasoning.

[12]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[13]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[14]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Vladimir Pavlovic,et al.  Protein classification using probabilistic chain graphs and the Gene Ontology structure , 2006, Bioinform..

[16]  Ting Chen,et al.  Mapping gene ontology to proteins based on protein-protein interaction data , 2004, Bioinform..