An iterative approach of protein function prediction

BackgroundCurrent approaches of predicting protein functions from a protein-protein interaction (PPI) dataset are based on an assumption that the available functions of the proteins (a.k.a. annotated proteins) will determine the functions of the proteins whose functions are unknown yet at the moment (a.k.a. un-annotated proteins). Therefore, the protein function prediction is a mono-directed and one-off procedure, i.e. from annotated proteins to un-annotated proteins. However, the interactions between proteins are mutual rather than static and mono-directed, although functions of some proteins are unknown for some reasons at present. That means when we use the similarity-based approach to predict functions of un-annotated proteins, the un-annotated proteins, once their functions are predicted, will affect the similarities between proteins, which in turn will affect the prediction results. In other words, the function prediction is a dynamic and mutual procedure. This dynamic feature of protein interactions, however, was not considered in the existing prediction algorithms.ResultsIn this paper, we propose a new prediction approach that predicts protein functions iteratively. This iterative approach incorporates the dynamic and mutual features of PPI interactions, as well as the local and global semantic influence of protein functions, into the prediction. To guarantee predicting functions iteratively, we propose a new protein similarity from protein functions. We adapt new evaluation metrics to evaluate the prediction quality of our algorithm and other similar algorithms. Experiments on real PPI datasets were conducted to evaluate the effectiveness of the proposed approach in predicting unknown protein functions.ConclusionsThe iterative approach is more likely to reflect the real biological nature between proteins when predicting functions. A proper definition of protein similarity from protein functions is the key to predicting functions iteratively. The evaluation results demonstrated that in most cases, the iterative approach outperformed non-iterative ones with higher prediction quality in terms of prediction precision, recall and F-value.

[1]  F. Ji,et al.  Prediction for Target Sites of Small Interfering RNA Duplexes in SARS Coronavirus , 2004, Genome Biology.

[2]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[3]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[4]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[5]  Vipin Kumar,et al.  Incorporating functional inter-relationships into protein function prediction algorithms , 2009, BMC Bioinformatics.

[6]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[7]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[8]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[9]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[10]  Miao Wang,et al.  Using Direct and Indirect Neighbours to Predict Protein Function in GO-Evaluated PPI Data Set , 2010, 2010 2nd International Workshop on Intelligent Systems and Applications.

[11]  See-Kiong Ng,et al.  Biological Data Mining in Protein Interaction Networks , 2009 .

[12]  Jingyu Hou,et al.  Iteratively Predict Protein Functions from Protein-Protein Interactions , 2010 .

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  Limsoon Wong,et al.  Predicting Protein Functions from Protein Interaction Networks , 2012, Int. J. Knowl. Discov. Bioinform..

[15]  Wei Zhu,et al.  Exploiting multi-layered information to iteratively predict protein functions. , 2012, Mathematical biosciences.

[16]  Gang Chen,et al.  GO Semantic Similarity Based Analysis for Huaman Protein Interactions , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[17]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[18]  Erliang Zeng,et al.  Estimating support for protein-protein interaction data with applications to function prediction. , 2008, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[19]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[20]  T Misteli,et al.  Protein dynamics: implications for nuclear architecture and gene expression. , 2001, Science.

[21]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[23]  Wai-Ki Ching,et al.  A Study of Network-based Kernel Methods on Protein-Protein Interaction for Protein Functions Prediction , 2009 .

[24]  Kui Zhang,et al.  Prediction of protein function using protein-protein interaction data , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.