Homology-based prediction of interactions between proteins using Averaged One-Dependence Estimators

BackgroundIdentification of protein-protein interactions (PPIs) is essential for a better understanding of biological processes, pathways and functions. However, experimental identification of the complete set of PPIs in a cell/organism (“an interactome”) is still a difficult task. To circumvent limitations of current high-throughput experimental techniques, it is necessary to develop high-performance computational methods for predicting PPIs.ResultsIn this article, we propose a new computational method to predict interaction between a given pair of protein sequences using features derived from known homologous PPIs. The proposed method is capable of predicting interaction between two proteins (of unknown structure) using Averaged One-Dependence Estimators (AODE) and three features calculated for the protein pair: (a) sequence similarities to a known interacting protein pair (FSeq), (b) statistical propensities of domain pairs observed in interacting proteins (FDom) and (c) a sum of edge weights along the shortest path between homologous proteins in a PPI network (FNet). Feature vectors were defined to lie in a half-space of the symmetrical high-dimensional feature space to make them independent of the protein order. The predictability of the method was assessed by a 10-fold cross validation on a recently created human PPI dataset with randomly sampled negative data, and the best model achieved an Area Under the Curve of 0.79 (pAUC0.5% = 0.16). In addition, the AODE trained on all three features (named PSOPIA) showed better prediction performance on a separate independent data set than a recently reported homology-based method.ConclusionsOur results suggest that FNet, a feature representing proximity in a known PPI network between two proteins that are homologous to a target protein pair, contributes to the prediction of whether the target proteins interact or not. PSOPIA will help identify novel PPIs and estimate complete PPI networks. The method proposed in this article is freely available on the web at http://mizuguchilab.org/PSOPIA.

[1]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[2]  Jiangning Song,et al.  Conditional random field approach to prediction of protein-protein interactions using domain information , 2011, BMC Systems Biology.

[3]  Jianhua Ruan,et al.  Building and analyzing protein interactome networks by cross-species comparisons , 2010, BMC Systems Biology.

[4]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[5]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[6]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[7]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[8]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[9]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[10]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[11]  Arnaud Céol,et al.  3did: identification and classification of domain-based interactions of known three-dimensional structure , 2010, Nucleic Acids Res..

[12]  Baldomero Oliva,et al.  Biana: a software framework for compiling biological interactions and analyzing networks , 2010, BMC Bioinformatics.

[13]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[14]  Yungki Park,et al.  Revisiting the negative example sampling problem for predicting protein-protein interactions , 2011, Bioinform..

[15]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[16]  Robert D. Finn,et al.  iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions , 2005, Bioinform..

[17]  Yungki Park,et al.  Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences , 2009, BMC Bioinformatics.

[18]  M. Gerstein,et al.  Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. , 2004, Genome research.

[19]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[20]  Loris Nanni,et al.  An ensemble of K-local hyperplanes for predicting protein-protein interactions , 2006, Bioinform..

[21]  T. Lane,et al.  Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions , 2009, PloS one.

[22]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[23]  Darby Tien-Hao Chang,et al.  Predicting protein-protein interactions in unbalanced data using the primary structure of proteins , 2010, BMC Bioinformatics.

[24]  Yangchao Huang,et al.  Simple sequence-based kernels do not predict protein-protein interactions , 2010, Bioinform..

[25]  Ashkan Golshani,et al.  Short Co-occurring Polypeptide Regions Can Predict Global Protein Interaction Maps , 2012, Scientific Reports.

[26]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[27]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[28]  J. R. Green,et al.  Global investigation of protein–protein interactions in yeast Saccharomyces cerevisiae using re-occurring short polypeptide sequences , 2008, Nucleic acids research.

[29]  M. Vidal,et al.  Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". , 2001, Genome research.

[30]  J Douglas Armstrong,et al.  Bio::Homology::InterologWalk - A Perl module to build putative protein-protein interaction networks through interolog mapping , 2011, BMC Bioinformatics.

[31]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[32]  Geoffrey I. Webb,et al.  Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification , 2011, Machine Learning.

[33]  Baldomero Oliva,et al.  BIPS: BIANA Interolog Prediction Server. A tool for protein–protein interaction inference , 2012, Nucleic Acids Res..

[34]  Beatriz García-Jiménez,et al.  Inference of Functional Relations in Predicted Protein Networks with a Machine Learning Approach , 2010, PloS one.

[35]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[36]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[37]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[38]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[39]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[40]  Dmitrij Frishman,et al.  The Negatome database: a reference set of non-interacting protein pairs , 2009, Nucleic Acids Res..

[41]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[42]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[43]  William Stafford Noble,et al.  Learning to predict protein-protein interactions from protein sequences , 2003, Bioinform..

[44]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[45]  Chun-Yu Lin,et al.  PPISearch: a web server for searching homologous protein–protein interactions across multiple species , 2009, Nucleic Acids Res..

[46]  Menglong Li,et al.  PRED_PPI: a server for predicting protein-protein interactions based on sequence data with probability assignment , 2010, BMC Research Notes.

[47]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.