Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences

BackgroundProtein-protein interactions underlie many important biological processes. Computational prediction methods can nicely complement experimental approaches for identifying protein-protein interactions. Recently, a unique category of sequence-based prediction methods has been put forward - unique in the sense that it does not require homologous protein sequences. This enables it to be universally applicable to all protein sequences unlike many of previous sequence-based prediction methods. If effective as claimed, these new sequence-based, universally applicable prediction methods would have far-reaching utilities in many areas of biology research.ResultsUpon close survey, I realized that many of these new methods were ill-tested. In addition, newer methods were often published without performance comparison with previous ones. Thus, it is not clear how good they are and whether there are significant performance differences among them. In this study, I have implemented and thoroughly tested 4 different methods on large-scale, non-redundant data sets. It reveals several important points. First, significant performance differences are noted among different methods. Second, data sets typically used for training prediction methods appear significantly biased, limiting the general applicability of prediction methods trained with them. Third, there is still ample room for further developments. In addition, my analysis illustrates the importance of complementary performance measures coupled with right-sized data sets for meaningful benchmark tests.ConclusionsThe current study reveals the potentials and limits of the new category of sequence-based protein-protein interaction prediction methods, which in turn provides a firm ground for future endeavours in this important area of contemporary bioinformatics.

[1]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[4]  Christopher W. V. Hogue,et al.  Structure-Templated Predictions of Novel Protein Interactions from Sequence Information , 2007, PLoS Comput. Biol..

[5]  Nianjun Liu,et al.  Inferring protein-protein interactions through high-throughput interaction data from diverse organisms , 2005, Bioinform..

[6]  K. Guimaraes,et al.  Predicting domain-domain interactions using a parsimony approach , 2006, Genome Biology.

[7]  Patrick Aloy,et al.  Interrogating protein interaction networks through structural biology , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Ting Chen,et al.  An integrated approach to the prediction of domain-domain interactions , 2006, BMC Bioinformatics.

[9]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[10]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[11]  E. van Nimwegen,et al.  Accurate Prediction of Protein–protein Interactions from Sequence Alignments Using a Bayesian Method , 2022 .

[12]  Emmanuel D Levy,et al.  Evolution and dynamics of protein interactions and networks. , 2008, Current opinion in structural biology.

[13]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[14]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[15]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[16]  P. Sparén,et al.  Efficiency of organised and opportunistic cytological screening for cancer in situ of the cervix , 1995, British Journal of Cancer.

[17]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[18]  H. Margalit,et al.  Built-in loops allow versatility in domain–domain interactions: Lessons from self-interacting domains , 2008, Proceedings of the National Academy of Sciences.

[19]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[20]  Ozlem Keskin,et al.  Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces , 2005, Bioinform..

[21]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[22]  William Stafford Noble,et al.  Large-scale identification of yeast integral membrane protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Ivan Molineris,et al.  A new computational approach to analyze human protein complexes and predict novel protein interactions , 2007, Genome Biology.

[24]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[25]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[26]  Xiaomei Wu,et al.  Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein–protein interaction dataset , 2008, Nucleic acids research.

[27]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[28]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[29]  Jérôme Wojcik,et al.  Protein-protein interaction map inference using interacting domain profile pairs , 2001, ISMB.

[30]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[31]  Burkhard Rost,et al.  Physical protein–protein interactions predicted from microarrays , 2008, Bioinform..

[32]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[33]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[34]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[35]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[36]  C. Landry,et al.  An in Vivo Map of the Yeast Protein Interactome , 2008, Science.

[37]  William Stafford Noble,et al.  Learning to predict protein-protein interactions from protein sequences , 2003, Bioinform..

[38]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[39]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[40]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[41]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[42]  Michael Schroeder,et al.  Using structural motif descriptors for sequence-based binding site prediction , 2007, BMC Bioinformatics.

[43]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[44]  Baldomero Oliva,et al.  Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships , 2005, Bioinform..

[45]  A. Valencia,et al.  High-confidence prediction of global interactomes based on genome-wide coevolutionary networks , 2008, Proceedings of the National Academy of Sciences.

[46]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[47]  Jinyan Li,et al.  Bioinformatics Original Paper Discovering Motif Pairs at Interaction Sites from Protein Sequences on a Proteome-wide Scale , 2022 .

[48]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[49]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[50]  K. Chou,et al.  Predicting protein-protein interactions from sequences in a hybridization space. , 2006, Journal of proteome research.

[51]  J. R. Green,et al.  Global investigation of protein–protein interactions in yeast Saccharomyces cerevisiae using re-occurring short polypeptide sequences , 2008, Nucleic acids research.

[52]  Tom M. W. Nye,et al.  Statistical analysis of domains in interacting protein pairs , 2005, Bioinform..

[53]  Alex Alves Freitas,et al.  Message-passing algorithms for the prediction of protein domain interactions from protein-protein interaction data , 2008, Bioinform..

[54]  Mudita Singhal,et al.  A domain-based approach to predict protein-protein interactions , 2007, BMC Bioinformatics.

[55]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[56]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[57]  Edward M Marcotte,et al.  A map of human protein interactions derived from co-expression of human mRNAs and their orthologs , 2008, Molecular systems biology.

[58]  D. Koller,et al.  InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale , 2007, Genome Biology.

[59]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[60]  Sven-Eric Schelhorn,et al.  An integrative approach for predicting interactions of protein regions , 2008, ECCB.

[61]  Wan Kyu Kim,et al.  Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair. , 2002, Genome informatics. International Conference on Genome Informatics.

[62]  See-Kiong Ng,et al.  Integrative approach for computationally inferring protein domain interactions , 2003, SAC '03.

[63]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[64]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[65]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[66]  Christopher J. Lee,et al.  Inferring protein domain interactions from databases of interacting proteins , 2005, Genome Biology.

[67]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[68]  See-Kiong Ng,et al.  Integrative Approach for Computationally Inferring Protein Domain Interactions , 2003, Bioinform..