Context specific protein function prediction.

Although whole-genome sequencing of many organisms has been completed, numerous newly discovered genes are still functionally unknown. Using high-throughput data such as protein-protein interaction (PPI) information to assign putative protein function to the unknown genes has been proposed, since in many cases it is not feasible to annotate the newly discovered genes by sequence-based approaches alone. In addition to PPI data, information such as protein localization within a cell may be employed to improve protein function prediction in two ways: 1) By using such localization information as a direct indicator of protein function (e.g. nucleolus localized proteins might be involved in ribosome biogenesis), and 2) by refining noisy PPI data by localization information. In the latter case, localization information may be used to distinguish different types of PPIs: Namely, interactions between co-localized proteins (more reliable), and interactions between differently localized proteins (potentially less reliable). In this paper, we propose a probabilistic method to predict protein function from PPI data and localization information. A Bayesian network is used to model dependencies between protein function, PPI data and localization information. We showed in our cross-validation experiment that in some cases, our method (conditioning PPI data by localization information) significantly improves prediction precision, as compared to a simple Naive Bayes method that assumes PPI data and localization information are conditionally independent given protein function. Finally, we predicted 57 unknown genes as "ribosome biogenesis" proteins.

[1]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[2]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[3]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Simon Kasif,et al.  Biological context networks: a mosaic view of the interactome , 2006, Molecular systems biology.

[5]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[6]  Kara Dolinski,et al.  Saccharomyces genome database: Underlying principles and organisation , 2004, Briefings Bioinform..

[7]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[8]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[10]  Ting Chen,et al.  An Integrated Probabilistic Model for Functional Prediction of Proteins , 2004, J. Comput. Biol..

[11]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[12]  Mike Tyers,et al.  The GRID: The General Repository for Interaction Datasets , 2003, Genome Biology.

[13]  Simon Kasif,et al.  Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data , 2007, PloS one.

[14]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[15]  C. DeLisi,et al.  The society of genes: networks of functional links between genes from comparative genomics , 2002, Genome Biology.

[16]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[17]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[18]  Ting Chen,et al.  An integrated probabilistic model for functional prediction of proteins , 2003, RECOMB '03.