Large-scale identification of human protein function using topological features of interaction network

The annotation of protein function is a vital step to elucidate the essence of life at a molecular level, and it is also meritorious in biomedical and pharmaceutical industry. Developments of sequencing technology result in constant expansion of the gap between the number of the known sequences and their functions. Therefore, it is indispensable to develop a computational method for the annotation of protein function. Herein, a novel method is proposed to identify protein function based on the weighted human protein-protein interaction network and graph theory. The network topology features with local and global information are presented to characterise proteins. The minimum redundancy maximum relevance algorithm is used to select 227 optimized feature subsets and support vector machine technique is utilized to build the prediction models. The performance of current method is assessed through 10-fold cross-validation test, and the range of accuracies is from 67.63% to 100%. Comparing with other annotation methods, the proposed way possesses a 50% improvement in the predictive accuracy. Generally, such network topology features provide insights into the relationship between protein functions and network architectures. The source code of Matlab is freely available on request from the authors.

[1]  Manish Kumar,et al.  NRfamPred: A proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families , 2014, Scientific Reports.

[2]  Jingyu Hou,et al.  Predicting protein functions from PPI networks using functional aggregation. , 2012, Mathematical biosciences.

[3]  Chris H. Q. Ding,et al.  Function-Function Correlated Multi-label Protein Function Prediction over Interaction Networks , 2013, J. Comput. Biol..

[4]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[5]  Limsoon Wong,et al.  Exploiting indirect neighbours and topological weight to predict protein function from protein--protein interactions , 2006 .

[6]  Yunming Ye,et al.  Protein functional properties prediction in sparsely-label PPI networks through regularized non-negative matrix factorization , 2015, BMC Systems Biology.

[7]  Dennis Shasha,et al.  Negative Example Selection for Protein Function Prediction: The NoGO Database , 2014, PLoS Comput. Biol..

[8]  Jingyu Hou,et al.  An iterative approach of protein function prediction , 2011, BMC Bioinformatics.

[9]  Andrey Rzhetsky,et al.  Quantitative systems-level determinants of human genes targeted by successful drugs. , 2008, Genome research.

[10]  Zhe Zhang,et al.  Efficient digest of high-throughput sequencing data in a reproducible report , 2013, BMC Bioinformatics.

[11]  Michel Schneider,et al.  UniProtKB/Swiss-Prot. , 2007, Methods in molecular biology.

[12]  Burkhard Rost,et al.  Prediction of DNA-binding residues from sequence , 2007, ISMB/ECCB.

[13]  Yiannis Kourmpetis,et al.  Bayesian Markov Random Field Analysis for Protein Function Prediction Based on Network Data , 2010, PloS one.

[14]  Hailong Zhu,et al.  Integrating multiple networks for protein function prediction , 2015, BMC Systems Biology.

[15]  Silvio C. E. Tosatto,et al.  Protein function prediction using guilty by association from interaction networks , 2015, Amino Acids.

[16]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[17]  B. Steinberg,et al.  Surface charge: a key determinant of protein localization and function. , 2010, Cancer research.

[18]  Jooyoung Lee,et al.  Hidden Information Revealed by Optimal Community Structure from a Protein-Complex Bipartite Network Improves Protein Function Prediction , 2013, PloS one.

[19]  Alain Guénoche,et al.  Multifunctional proteins revealed by overlapping clustering in protein interaction network , 2011, Bioinform..

[20]  Noah M. Daniels,et al.  Going the Distance for Protein Function Prediction: A New Distance Metric for Protein Interaction Networks , 2013, PloS one.

[21]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[22]  Jiang Li,et al.  Genome-wide protein-protein interactions and protein function exploration in cyanobacteria , 2015, Scientific Reports.

[23]  Ljupco Kocarev,et al.  Exploring Function Prediction in Protein Interaction Networks via Clustering Methods , 2014, PloS one.

[24]  M. Watve,et al.  Phenotypic Plasticity and Effects of Selection on Cell Division Symmetry in Escherichia coli , 2011, PloS one.

[25]  Dao-Qing Dai,et al.  A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[27]  Ka-Lok Ng,et al.  Prediction of protein functions based on function-function correlation relations , 2010, Comput. Biol. Medicine.

[28]  Renzhi Cao,et al.  Three-Level Prediction of Protein Function by Combining Profile-Sequence Search, Profile-Profile Search, and Domain Co-Occurrence Networks , 2013, BMC Bioinformatics.

[29]  B. Liu,et al.  DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation , 2015, Scientific Reports.

[30]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[31]  Martin H. Schaefer,et al.  HIPPIE: Integrating Protein Interaction Networks with Experiment Based Quality Scores , 2012, PloS one.

[32]  Michael J. E. Sternberg,et al.  CombFunc: predicting protein function using heterogeneous data sources , 2012, Nucleic Acids Res..

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  Karin M. Verspoor,et al.  Combining heterogeneous data sources for accurate functional annotation of proteins , 2013, BMC Bioinformatics.

[35]  Jeroen de Ridder,et al.  Scale-space measures for graph topology link protein network architecture to function , 2014, Bioinform..

[36]  Jooyoung Lee,et al.  Improved network community structure improves function prediction , 2013, Scientific Reports.

[37]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[38]  Kaare Teilum,et al.  Protein stability, flexibility and function. , 2011, Biochimica et biophysica acta.

[39]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[41]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[42]  T. Grundström,et al.  The role of protein surface charges in ion binding , 1988, Nature.

[43]  Christine Brun,et al.  Network analysis and protein function prediction with the PRODISTIN Web site. , 2012, Methods in molecular biology.

[44]  Daniel W. A. Buchan,et al.  Protein function prediction by massive integration of evolutionary analyses and multiple data sources , 2013, BMC Bioinformatics.

[45]  Jiun-Yan Huang,et al.  Accurate and fast computational method for identifying protein function using protein-protein interaction data. , 2010, Molecular bioSystems.

[46]  Kuo-Chen Chou,et al.  Predicting Functions of Proteins in Mouse Based on Weighted Protein-Protein Interaction Network and Protein Hybrid Properties , 2011, PloS one.

[47]  Natasa Przulj,et al.  Topology-function conservation in protein–protein interaction networks , 2015, Bioinform..

[48]  Lenore Cowen,et al.  New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence , 2014, Bioinform..

[49]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[50]  Xin Chen,et al.  An improved classification of G-protein-coupled receptors using sequence-derived features , 2010, BMC Bioinformatics.

[51]  Alfredo Benso,et al.  A combined approach for genome wide protein function annotation/prediction , 2013, Proteome Science.

[52]  Slobodan Vucetic,et al.  MS-kNN: protein function prediction by integrating multiple data sources , 2013, BMC Bioinformatics.