Effectively predicting protein functions by collective classification — An extended abstract

The high-throughput technologies have led to vast amounts of protein-protein interaction (PPI) data, and a number of approaches based on PPI networks have been proposed for protein function prediction. However, these approaches do not work well if there is not enough PPI information. To address this issue, we propose a novel collective classification based approach that combines protein sequence information and PPI information to improve the prediction performance. We first reconstruct a PPI network by adding a number of computed edges based on protein sequence similarity, and then apply a collective classification algorithm to predict protein function based on the new PPI network. Experiments over two real datasets demonstrate that our approach outperforms most of existing approaches across a series of label situations, especially in sparsely-labeled networks where the existing approaches fail because of PPI information inadequacy. Experimental results also validate the robustness of our approach to the number of labeled proteins in PPI networks.

[1]  Christian J. A. Sigrist,et al.  Nucleic Acids Research Advance Access published November 14, 2007 The 20 years of PROSITE , 2007 .

[2]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[3]  Shoshana J. Wodak,et al.  CYGD: the Comprehensive Yeast Genome Database , 2004, Nucleic Acids Res..

[4]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[5]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[6]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[7]  Limsoon Wong,et al.  Exploiting indirect neighbours and topological weight to predict protein function from protein--protein interactions , 2006 .

[8]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[9]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[10]  J M Thornton,et al.  Derivation of 3D coordinate templates for searching structural databases: Application to ser‐His‐Asp catalytic triads in the serine proteinases and lipases , 1996, Protein science : a publication of the Protein Society.

[11]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[12]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[13]  Igor V. Tetko,et al.  The Mouse Functional Genome Database (MfunGD): functional annotation of proteins in the light of their cellular context , 2005, Nucleic Acids Res..

[14]  Jan Griebsch,et al.  PAST: fast structure-based searching in the PDB , 2006, Nucleic Acids Res..

[15]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[16]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[17]  Adam Godzik,et al.  FATCAT: a web server for flexible structure comparison and structure similarity searching , 2004, Nucleic Acids Res..

[18]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[19]  Marco Punta,et al.  Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. , 2005, Drug discovery today.