PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction

Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using an assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species.

[1]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[2]  Baldomero Oliva,et al.  Biana: a software framework for compiling biological interactions and analyzing networks , 2010, BMC Bioinformatics.

[3]  Sminu Izudheen,et al.  A Survey: Evaluation of Ensemble Classifiers and Data Level Methods to Deal with Imbalanced Data Problem in Protein-Protein Interactions , 2013 .

[4]  Baldomero Oliva,et al.  BIPS: BIANA Interolog Prediction Server. A tool for protein–protein interaction inference , 2012, Nucleic Acids Res..

[5]  R. Russell,et al.  A more complete, complexed and structured interactome. , 2007, Current opinion in structural biology.

[6]  J L Sussman,et al.  Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. , 1998, Acta crystallographica. Section D, Biological crystallography.

[7]  J Douglas Armstrong,et al.  Bio::Homology::InterologWalk - A Perl module to build putative protein-protein interaction networks through interolog mapping , 2011, BMC Bioinformatics.

[8]  Zhongming Zhao,et al.  Phylogenetic profiles for the prediction of protein-protein interactions: how to select reference organisms? , 2007, Biochemical and biophysical research communications.

[9]  Mei Liu,et al.  Integrative Neural Network Approach for Protein Interaction Prediction from Heterogeneous Data , 2008, ADMA.

[10]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[11]  Elisenda Feliu,et al.  Understanding protein-protein interactions using local structural features. , 2013, Journal of molecular biology.

[12]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[13]  B. Alberts The Cell as a Collection of Protein Machines: Preparing the Next Generation of Molecular Biologists , 1998, Cell.

[14]  Dmitrij Frishman,et al.  The Negatome database: a reference set of non-interacting protein pairs , 2009, Nucleic Acids Res..

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Xiangxiang Zeng,et al.  nDNA-prot: identification of DNA-binding proteins based on unbalanced classification , 2014, BMC Bioinformatics.

[17]  Simon Kasif,et al.  Identification of functional links between genes using phylogenetic profiles , 2003, Bioinform..

[18]  Frederick P. Roth,et al.  Predicting co-complexed protein pairs using genomic and proteomic data integration , 2004, BMC Bioinformatics.

[19]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[20]  Teresa M. Przytycka,et al.  Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment , 2007, BMC Bioinformatics.

[21]  T Gaasterland,et al.  Constructing multigenome views of whole microbial genomes. , 1998, Microbial & comparative genomics.

[22]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[24]  Q. Zou,et al.  Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier , 2013, PloS one.

[25]  William Stafford Noble,et al.  Learning to predict protein-protein interactions from protein sequences , 2003, Bioinform..

[26]  Alfonso Valencia,et al.  Selection of organisms for the co-evolution-based study of protein interactions , 2011, BMC Bioinformatics.

[27]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[28]  Inyoung Kim,et al.  Protein interaction predictions from diverse sources. , 2008, Drug discovery today.

[29]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[30]  Min Zhu,et al.  Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions , 2012, Comput. Biol. Chem..

[31]  Mark A. Ragan,et al.  Automatic selection of reference taxa for protein-protein interaction prediction with phylogenetic profiling , 2012, Bioinform..

[32]  Matteo Pellegrini,et al.  An improved method for identifying functionally linked proteins using phylogenetic profiles , 2007, BMC Bioinformatics.

[33]  Chen Lin,et al.  LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy , 2014, Neurocomputing.

[34]  Darby Tien-Hao Chang,et al.  Predicting protein-protein interactions in unbalanced data using the primary structure of proteins , 2010, BMC Bioinformatics.

[35]  Charles DeLisi,et al.  Comparative assessment of performance and genome dependence among phylogenetic profiling methods , 2006, BMC Bioinformatics.

[36]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[37]  Edward M Marcotte,et al.  Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages , 2003, Nature Biotechnology.

[38]  Athanasios K. Tsakalidis,et al.  Computational Approaches for the Prediction of Protein-Protein Interactions: A Survey , 2011 .

[39]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[40]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[41]  Mark A. Ragan,et al.  Gene Ontology-driven inference of protein-protein interactions using inducers , 2011 .

[42]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[43]  M. Snyder,et al.  Proteomics: Protein complexes take the bait , 2002, Nature.

[44]  Dennis P. Wall,et al.  Testing the Accuracy of Eukaryotic Phylogenetic Profiles for Prediction of Biological Function , 2008, Evolutionary bioinformatics online.