FunPred 3.0: improved protein function prediction using protein interaction network

Proteins are the most versatile macromolecules in living systems and perform crucial biological functions. In the advent of the post-genomic era, the next generation sequencing is done routinely at the population scale for a variety of species. The challenging problem is to massively determine the functions of proteins that are yet not characterized by detailed experimental studies. Identification of protein functions experimentally is a laborious and time-consuming task involving many resources. We therefore propose the automated protein function prediction methodology using in silico algorithms trained on carefully curated experimental datasets. We present the improved protein function prediction tool FunPred 3.0, an extended version of our previous methodology FunPred 2, which exploits neighborhood properties in protein–protein interaction network (PPIN) and physicochemical properties of amino acids. Our method is validated using the available functional annotations in the PPIN network of Saccharomyces cerevisiae in the latest Munich information center for protein (MIPS) dataset. The PPIN data of S. cerevisiae in MIPS dataset includes 4,554 unique proteins in 13,528 protein–protein interactions after the elimination of the self-replicating and the self-interacting protein pairs. Using the developed FunPred 3.0 tool, we are able to achieve the mean precision, the recall and the F-score values of 0.55, 0.82 and 0.66, respectively. FunPred 3.0 is then used to predict the functions of unpredicted protein pairs (incomplete and missing functional annotations) in MIPS dataset of S. cerevisiae. The method is also capable of predicting the subcellular localization of proteins along with its corresponding functions. The code and the complete prediction results are available freely at: https://github.com/SovanSaha/FunPred-3.0.git.

[1]  M. Nasipuri,et al.  Improving prediction of protein function from protein interaction network using intelligent neighborhood approach , 2012, 2012 International Conference on Communications, Devices and Intelligent Systems (CODIS).

[2]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[3]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  Surinder Kaur,et al.  Predicting Protein Function using Decision Tree , 2008 .

[5]  Silvio C. E. Tosatto,et al.  INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity , 2015, Nucleic Acids Res..

[6]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[7]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[8]  Gajendra P S Raghava,et al.  A simple approach for predicting protein-protein interactions. , 2010, Current protein & peptide science.

[9]  Mong-Li Lee,et al.  Labeling network motifs in protein interactomes for protein function prediction , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[11]  Fang Wu,et al.  Detecting overlapping protein complexes in PPI networks based on robustness , 2013, Proteome Science.

[12]  Hailong Zhu,et al.  Predicting protein functions using incomplete hierarchical labels , 2015, BMC Bioinformatics.

[13]  M. Nasipuri,et al.  Protein function by minimum distance classifier from protein interaction network , 2012, 2012 International Conference on Communications, Devices and Intelligent Systems (CODIS).

[14]  Giorgio Valentini,et al.  Hierarchical Ensemble Methods for Protein Function Prediction , 2014, ISRN bioinformatics.

[15]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[16]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[17]  Chi Zhang,et al.  A novel function prediction approach using protein overlap networks , 2013, BMC Systems Biology.

[18]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[19]  Rui Fa,et al.  Predicting human protein function with multi-task deep neural networks , 2018, bioRxiv.

[20]  Jonathan Qiang Jiang,et al.  Predicting Protein Function by Multi-Label Correlated Semi-Supervised Learning , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[22]  Steven E Brenner,et al.  Bacterial Interactomes: Interacting Protein Partners Share Similar Function and Are Validated in Independent Assays More Frequently Than Previously Reported* , 2016, Molecular & Cellular Proteomics.

[23]  Penny J. Beuning,et al.  Biochemical functional predictions for protein structures of unknown or uncertain function , 2015, Computational and structural biotechnology journal.

[24]  Piyali Chatterjee,et al.  Functional Group Prediction of Un-annotated Protein by Exploiting Its Neighborhood Analysis in Saccharomyces Cerevisiae Protein Interaction Network , 2016, ACSS.

[25]  Lu Chen,et al.  Improving protein function prediction using domain and protein complexes in PPI networks , 2014, BMC Systems Biology.

[26]  Ujjwal Maulik,et al.  Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Jun Wang,et al.  MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC) , 2014, BMC Bioinformatics.

[28]  Hui Sun,et al.  Protein Function Prediction Using Function Associations in Protein–Protein Interaction Network , 2018, IEEE Access.

[29]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[30]  Masoud Rahgozar,et al.  Protein function prediction using neighbor relativity in protein-protein interaction network , 2013, Comput. Biol. Chem..

[31]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[32]  C. Gautier,et al.  Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. , 1994, Nucleic acids research.

[33]  Subhadip Basu,et al.  FunPred-1: Protein function prediction from a protein interaction network using neighborhood analysis , 2014, Cellular & Molecular Biology Letters.

[34]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[35]  Hu Chen,et al.  Inferring protein function by domain context similarities in protein-protein interaction networks , 2009, BMC Bioinformatics.

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[38]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[39]  J. Celis,et al.  Reference points for comparisons of two‐dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions , 1994, Electrophoresis.

[40]  Stavros Makrodimitris,et al.  Improving protein function prediction using protein sequence and GO-term similarities , 2018, Bioinform..

[41]  Yijia Zhang,et al.  A method for predicting protein complex in dynamic PPI networks , 2016, BMC Bioinformatics.

[42]  Piyali Chatterjee,et al.  PPI_SVM: Prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables , 2011, Cellular & Molecular Biology Letters.

[43]  Wei Xiong,et al.  Protein function prediction by collective classification with explicit and implicit edges in protein-protein interaction networks , 2013, BMC Bioinformatics.

[44]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[45]  Piyali Chatterjee,et al.  PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines , 2011, Journal of molecular modeling.

[46]  Adam Zemla,et al.  SpaK/SpaR Two-component System Characterized by a Structure-driven Domain-fusion Method and in Vitro Phosphorylation Studies , 2009, PLoS Comput. Biol..

[47]  Subhadip Basu,et al.  Analysis of protein targets in pathogen–host interaction in infectious diseases: a case study on Plasmodium falciparum and Homo sapiens interaction network , 2017, Briefings in functional genomics.

[48]  Xueyong Li,et al.  A New Method for Predicting Protein Functions From Dynamic Weighted Interactome Networks , 2016, IEEE Transactions on NanoBioscience.

[49]  Piyali Chatterjee,et al.  Protein Function Prediction from Protein Interaction Network Using Bottom-up L2L Apriori Algorithm , 2017 .

[50]  Patricia C. Babbitt,et al.  Effusion: prediction of protein function from sequence similarity networks , 2018, Bioinform..

[51]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[52]  Zheng Sun,et al.  PANDA: Protein function prediction using domain architecture and affinity propagation , 2018, Scientific Reports.