LocFuse: human protein-protein interaction prediction via classifier fusion using protein localization information.

UNLABELLED Protein-protein interaction (PPI) detection is one of the central goals of functional genomics and systems biology. Knowledge about the nature of PPIs can help fill the widening gap between sequence information and functional annotations. Although experimental methods have produced valuable PPI data, they also suffer from significant limitations. Computational PPI prediction methods have attracted tremendous attentions. Despite considerable efforts, PPI prediction is still in its infancy in complex multicellular organisms such as humans. Here, we propose a novel ensemble learning method, LocFuse, which is useful in human PPI prediction. This method uses eight different genomic and proteomic features along with four types of different classifiers. The prediction performance of this classifier selection method was found to be considerably better than methods employed hitherto. This confirms the complex nature of the PPI prediction problem and also the necessity of using biological information for classifier fusion. The LocFuse is available at: http://lbb.ut.ac.ir/Download/LBBsoft/LocFuse. BIOLOGICAL SIGNIFICANCE The results revealed that if we divide proteome space according to the cellular localization of proteins, then the utility of some classifiers in PPI prediction can be improved. Therefore, to predict the interaction for any given protein pair, we can select the most accurate classifier with regard to the cellular localization information. Based on the results, we can say that the importance of different features for PPI prediction varies between differently localized proteins; however in general, our novel features, which were extracted from position-specific scoring matrices (PSSMs), are the most important ones and the Random Forest (RF) classifier performs best in most cases. LocFuse was developed with a user-friendly graphic interface and it is freely available for Linux, Mac OSX and MS Windows operating systems.

[1]  Reza Salavati,et al.  Sequence-based prediction of protein-protein interactions by means of codon usage , 2008, Genome Biology.

[2]  Javad Zahiri,et al.  Computational Prediction of Protein–Protein Interaction Networks: Algo-rithms and Resources , 2013, Current genomics.

[3]  Ashok N. Srivastava,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2005, J. Comput. Inf. Sci. Eng..

[4]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[5]  Xiaomei Wu,et al.  Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations , 2006, Nucleic acids research.

[6]  Harald Seitz,et al.  Protein – Protein Interaction , 2008 .

[7]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[8]  Burkhard Rost,et al.  Create and assess protein networks through molecular characteristics of individual proteins , 2006, ISMB.

[9]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[12]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[13]  P. Uetz,et al.  Molecular Interaction between Limb Deformity Proteins (Formins) and Src Family Kinases* , 1996, The Journal of Biological Chemistry.

[14]  Reza Ebrahimpour,et al.  PPIevo: protein-protein interaction prediction from PSSM based evolutionary information. , 2013, Genomics.

[15]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Johannes Goll,et al.  Protein interaction data curation: the International Molecular Exchange (IMEx) consortium , 2012, Nature Methods.

[17]  Charlotte M. Deane,et al.  Synonymous codon usage influences the local protein structure observed , 2010, Nucleic acids research.

[18]  Burkhard Rost,et al.  Supporting online material for : LocTree 2 predicts localization for all domains of life , 2012 .

[19]  Peer Bork,et al.  Deciphering a global network of functionally associated post-translational modifications , 2012, Molecular systems biology.

[20]  Sara Ballouz,et al.  Bias tradeoffs in the creation and analysis of protein-protein interaction networks. , 2014, Journal of proteomics.

[21]  Isabelle Gagnon-Arsenault,et al.  Transcriptional divergence plays a role in the rewiring of protein interaction networks after gene duplication. , 2013, Journal of proteomics.

[22]  Lei Deng,et al.  PrePPI: a structure-informed database of protein–protein interactions , 2012, Nucleic Acids Res..

[23]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[24]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[25]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[26]  C. A. Andersen,et al.  Prediction of human protein function from post-translational modifications and localization features. , 2002, Journal of molecular biology.

[27]  Yukiko Matsuoka,et al.  Adding Protein Context to the Human Protein-Protein Interaction Network to Reveal Meaningful Interactions , 2013, PLoS Comput. Biol..

[28]  Ian Witten,et al.  Data Mining , 2000 .

[29]  Xue-wen Chen,et al.  Heterogeneous data integration by tree‐augmented naïve Bayes for protein–protein interactions prediction , 2013, Proteomics.

[30]  Xue-wen Chen,et al.  KUPS: constructing datasets of interacting and non-interacting protein pairs with associated attributions , 2010, Nucleic Acids Res..

[31]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[32]  E. Marcotte,et al.  A flaw in the typical evaluation scheme for pair-input computational predictions , 2012, Nature Methods.

[33]  M. Rots,et al.  Step out of the groove: epigenetic gene control systems and engineered transcription factors. , 2006, Advances in genetics.

[34]  M. Huynen,et al.  Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution , 2008, Journal of The Royal Society Interface.

[35]  Reza Ebrahimpour,et al.  Improving mixture of experts for view-independent face recognition using teacher-directed learning , 2011, Machine Vision and Applications.

[36]  Ben Lehner,et al.  Tissue specificity and the human protein interaction network , 2009, Molecular systems biology.

[37]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[38]  Catia Pesquita,et al.  ProteInOn: A Web Tool for Protein Semantic Similarity , 2007 .

[39]  Dmitrij Frishman,et al.  Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis , 2013, Nucleic Acids Res..

[40]  P. Bork,et al.  Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs , 2004, Nature Biotechnology.

[41]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[42]  Mehmed Kantardzic,et al.  Data-Mining Concepts , 2011 .

[43]  Ulrich Stelzl,et al.  Dual Coordination of Post Translational Modifications in Human Protein Networks , 2013, PLoS Comput. Biol..

[44]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[45]  Zhu-Hong You,et al.  Increasing reliability of protein interactome by fast manifold embedding , 2013, Pattern Recognit. Lett..

[46]  Michael Schroeder,et al.  Large-scale De Novo Prediction of Physical Protein-Protein Association* , 2011, Molecular & Cellular Proteomics.

[47]  Tong Zhou,et al.  Translationally optimal codons associate with structurally sensitive sites in proteins. , 2009, Molecular biology and evolution.

[48]  B. Alberts The Cell as a Collection of Protein Machines: Preparing the Next Generation of Molecular Biologists , 1998, Cell.

[49]  T. Pawson,et al.  Reading protein modifications with interaction domains , 2006, Nature Reviews Molecular Cell Biology.

[50]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[51]  Michael Lässig,et al.  From protein interactions to functional annotation: graph alignment in Herpes , 2007, BMC Systems Biology.

[52]  I. Talianidis,et al.  Cross-talk between post-translational modifications regulates life or death decisions by E2F1 , 2010, Cell cycle.

[53]  Mark A. Ragan,et al.  Gene Ontology-driven inference of protein-protein interactions using inducers , 2011 .

[54]  Ker-Chau Li,et al.  Human protein-protein interaction prediction by a novel sequence-based co-evolution method: co-evolutionary divergence , 2012, Bioinform..

[55]  F Arisaka [Protein-protein interaction]. , 1994, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.

[56]  Hampapathalu Adimurthy Nagarajaram,et al.  Global versus local hubs in human protein-protein interaction network. , 2013, Journal of proteome research.

[57]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[58]  Zhu-Hong You,et al.  Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding , 2013, Neurocomputing.

[59]  Yangchao Huang,et al.  Simple sequence-based kernels do not predict protein-protein interactions , 2010, Bioinform..

[60]  Reza Salavati,et al.  Universal function-specificity of codon usage , 2009, Nucleic acids research.

[61]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[62]  Gesine Reinert,et al.  Predicting and Validating Protein Interactions Using Network Structure , 2008, PLoS Comput. Biol..

[63]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[64]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[65]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[66]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[67]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from the literature: Part II , 2005, Bioinform..

[68]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[69]  Yungki Park,et al.  Revisiting the negative example sampling problem for predicting protein-protein interactions , 2011, Bioinform..

[70]  Luonan Chen,et al.  Proteome-wide prediction of protein-protein interactions from high-throughput data , 2012, Protein & Cell.

[71]  Roded Sharan,et al.  Enhancing the Prioritization of Disease-Causing Genes through Tissue Specific Protein Interaction Networks , 2012, PLoS Comput. Biol..

[72]  R. Nussinov,et al.  Exploiting conformational ensembles in modeling protein-protein interactions on the proteome scale. , 2013, Journal of proteome research.

[73]  R. Ebrahimpour,et al.  Improving ECG Classification Accuracy Using an Ensemble of Neural Network Modules , 2011, PloS one.

[74]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..