SPRINT: ultrafast protein-protein interaction prediction of the entire human interactome

AbstractBackgroundProteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error. Many computational methods have been proposed among which sequence-based ones are very promising. However, so far no such method is able to predict effectively the entire human interactome: they require too much time or memory.ResultsWe present SPRINT (Scoring PRotein INTeractions), a new sequence-based algorithm and tool for predicting protein-protein interactions. We comprehensively compare SPRINT with state-of-the-art programs on seven most reliable human PPI datasets and show that it is more accurate while running orders of magnitude faster and using very little memory.ConclusionSPRINT is the only sequence-based program that can effectively predict the entire human interactome: it requires between 15 and 100 min, depending on the dataset. Our goal is to transform the very challenging problem of predicting the entire human interactome into a routine task.AvailabilityThe source code of SPRINT is freely available from https://github.com/lucian-ilie/SPRINT/ and the datasets and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/.

[1]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[2]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[3]  Jianmin Wu,et al.  Integrated network analysis platform for protein-protein interactions , 2009, Nature Methods.

[4]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[5]  Jijun Tang,et al.  Predicting protein-protein interactions via multivariate mutual information of protein sequences , 2016, BMC Bioinformatics.

[6]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[7]  Martin H. Schaefer,et al.  HIPPIE: Integrating Protein Interaction Networks with Experiment Based Quality Scores , 2012, PloS one.

[8]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[9]  Saikat Chakrabarti,et al.  State of the art: refinement of multiple sequence alignments , 2010, BMC Bioinform..

[10]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[11]  E. Marcotte,et al.  A flaw in the typical evaluation scheme for pair-input computational predictions , 2012, Nature Methods.

[12]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[13]  Zhu-Hong You,et al.  An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers , 2017, Neurocomputing.

[14]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[15]  Luonan Chen,et al.  Proteome-wide prediction of protein-protein interactions from high-throughput data , 2012, Protein & Cell.

[16]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[17]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases , 2007, PLoS Comput. Biol..

[18]  L. Bonetta Protein–protein interactions: Interactome under construction , 2010, Nature.

[19]  Hans-Werner Mewes,et al.  MPact: the MIPS protein interaction resource on yeast , 2005, Nucleic Acids Res..

[20]  Doheon Lee,et al.  Comparative analysis of the JAK/STAT signaling through erythropoietin receptor and thrombopoietin receptor using a systems approach , 2009, BMC Bioinformatics.

[21]  Javad Zahiri,et al.  Computational Prediction of Protein–Protein Interaction Networks: Algo-rithms and Resources , 2013, Current genomics.

[22]  Darby Tien-Hao Chang,et al.  Predicting the protein-protein interactions using primary structures with predicted protein surface , 2010, BMC Bioinformatics.

[23]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[24]  Yungki Park,et al.  Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences , 2009, BMC Bioinformatics.

[25]  J. R. Green,et al.  Global investigation of protein–protein interactions in yeast Saccharomyces cerevisiae using re-occurring short polypeptide sequences , 2008, Nucleic acids research.

[26]  Hong-Bin Shen,et al.  Adaptive compressive learning for prediction of protein-protein interactions from primary sequence. , 2011, Journal of theoretical biology.

[27]  Burkhard Rost,et al.  More challenges for machine-learning protein interactions , 2015, Bioinform..

[28]  Burkhard Rost,et al.  Evolutionary profiles improve protein-protein interaction prediction from sequence , 2015, Bioinform..

[29]  Bin Ma,et al.  PatternHunter II: highly sensitive and fast homology search. , 2003, Genome informatics. International Conference on Genome Informatics.

[30]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[31]  Reza Ebrahimpour,et al.  PPIevo: protein-protein interaction prediction from PSSM based evolutionary information. , 2013, Genomics.

[32]  Purvesh Khatri,et al.  Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments , 2004, Nucleic Acids Res..

[33]  Reza Ebrahimpour,et al.  LocFuse: human protein-protein interaction prediction via classifier fusion using protein localization information. , 2014, Genomics.

[34]  Tao Han,et al.  Improvement in the Reproducibility and Accuracy of DNA Microarray Quantification by Optimizing Hybridization Conditions , 2006, BMC Bioinformatics.

[35]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[36]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[37]  Zhu-Hong You,et al.  Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest , 2015, PloS one.

[38]  Lucian Ilie,et al.  SpEED: fast computation of sensitive spaced seeds , 2011, Bioinform..

[39]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[40]  Lucian Ilie,et al.  Multiple spaced seeds for homology search , 2007, Bioinform..

[41]  Shao-Wu Zhang,et al.  Prediction of Protein–Protein Interaction with Pairwise Kernel Support Vector Machine , 2014, International journal of molecular sciences.

[42]  Jian Huang,et al.  Regularized gene selection in cancer microarray meta-analysis , 2009, BMC Bioinformatics.