Protein Function Prediction Using Function Associations in Protein–Protein Interaction Network

In recent years, the rapid development of high-throughput technology has led to huge amounts of protein–protein interaction (PPI) data and unannotated protein sequences. Many approaches for protein function prediction have been developed which use PPI networks information. Traditional methods usually use the dependencies among interacting proteins for each same function only. However, the functions which are barely linked with the same function are more difficult to predict. In multi-label settings, the dependencies among related instances with multiple labels are more complex; rationally using these associations can make up for the shortcomings of traditional methods. In this paper, an iterative algorithm is applied to predict protein function based on the new network. The proposed method is able to capture the dependencies among functions based on proteins and interactions for protein function prediction. The test results show that the algorithm performs better than most of existing network based PPI algorithms; adding sequence similarity edges and spread function information can really improve the prediction performance. In addition, the dependencies among functions based on proteins and interactions can be effectively applied to the prediction of protein function.

[1]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[2]  Marc R Wilkins,et al.  Using proteomics to mine genome sequences. , 2004, Journal of proteome research.

[3]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[4]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[7]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[8]  W. Wong,et al.  Transitive functional annotation by shortest-path analysis of gene expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Renzhi Cao,et al.  Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. , 2016, Methods.

[10]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[11]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[12]  Daisuke Kihara,et al.  New paradigm in protein function prediction for large scale omics analysis. , 2008, Molecular bioSystems.

[13]  Kemal Büyükgüzel,et al.  Eukaryotic Transcriptional Regulatory Proteins , 2000 .

[14]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[15]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[16]  K. Kohn Molecular interaction map of the mammalian cell cycle control and DNA repair systems. , 1999, Molecular biology of the cell.

[17]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[18]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[19]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[20]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[21]  Z. Huang,et al.  Using cellular automata images and pseudo amino acid composition to predict protein subcellular location , 2005, Amino Acids.

[22]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[23]  Wei Xiong,et al.  Active learning for protein function prediction in protein-protein interaction networks , 2013, Neurocomputing.

[24]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.