ProteinPrompt: a webserver for predicting protein–protein interactions

Motivation Protein-protein interactions play an essential role in a great variety of cellular processes and are therefore of significant interest for the design of new therapeutic compounds as well as the identification of side-effects due to unexpected binding. Here, we present ProteinPrompt, a web-server that uses machine-learning algorithms to calculate specific, currently unknown protein-protein interactions. Our tool is designed to quickly and reliably predict contacts based on an input sequence in order to scan large sequence libraries for potential binding partners, with the goal to accelerate and assure the quality of the laborious process of drug target identification. Methods We collected and thoroughly filtered a comprehensive database of known contacts from several sources, which is available as download. ProteinPrompt provides two complementary search methods of similar accuracy for comparison and consensus building. The default method is a random forest algorithm that uses the auto-correlations of seven amino acid scales. Alternatively, a graph neural network implementation can be selected. For each query sequence, potential binding partners are identified from a protein sequence database. The proteom of several organisms are available and can be searched for contacts. Results To evaluate the predictive power of the algorithms, we prepared a test dataset that was rigorously filtered for redundancy. No sequence pairs similar to the ones used for training were included in this dataset. With this challenging dataset, the random forest method achieved an accuracy rate of 0.88 and an area under curve of 0.95. The graph neural network achieved an accuracy rate of 0.86 using the same dataset. Since the underlying learning approaches are unrelated, comparing the results of random forest and graph neural networks reduces the likelihood of errors. ProteinPrompt is available online at: http://proteinformatics.org/ProteinPrompt The server makes it possible to scan the human proteome for potential binding partners of an input sequence within minutes. Conclusion We offer a fast, accurate, easy-to-use online service for predicting binding partners from an input sequence.

[1]  S. Chakrabarti,et al.  Classification and prediction of protein–protein interaction interface using machine learning algorithm , 2021, Scientific Reports.

[2]  Dandan Song,et al.  Graph-based prediction of Protein-protein interactions with attributed signed graph embedding , 2020, BMC Bioinformatics.

[3]  Wei Wang,et al.  Protein Interface Complementarity and Gene Duplication Improve Link Prediction of Protein-Protein Interaction Network , 2020, Frontiers in Genetics.

[4]  Wei Chen,et al.  FCTP-WSRC: Protein–Protein Interactions Prediction via Weighted Sparse Representation Based Classification , 2020, Frontiers in Genetics.

[5]  Akhilesh Kumar Bajpai,et al.  Systematic comparison of the protein-protein interaction databases from a user's perspective , 2020, J. Biomed. Informatics.

[6]  Xiaopan Zhang,et al.  Prediction of Protein-Protein Interactions Based on Domain , 2019, Comput. Math. Methods Medicine.

[7]  Lingqing Wang,et al.  Predicting Protein-Protein Interactions from Matrix-Based Protein Sequence Using Convolution Neural Network and Feature-Selective Rotation Forest , 2019, Scientific Reports.

[8]  Carlo Zaniolo,et al.  Multifaceted protein–protein interaction prediction based on Siamese residual RCNN , 2019, Bioinform..

[9]  Yu Yao,et al.  An integration of deep learning with feature embedding for protein–protein interaction prediction , 2019, PeerJ.

[10]  E. Bigio,et al.  Protein-protein interactions reveal key canonical pathways, upstream regulators, interactome domains, and novel targets in ALS , 2018, Scientific Reports.

[11]  Behnam Neyshabur,et al.  Predicting protein‐protein interactions through sequence‐based deep learning , 2018, Bioinform..

[12]  Burkhard Rost,et al.  ProfPPIdb: Pairs of physical protein-protein interactions predicted for entire proteomes , 2018, bioRxiv.

[13]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[14]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[15]  Luhua Lai,et al.  Sequence-based prediction of protein protein interaction using a deep-learning algorithm , 2017, BMC Bioinformatics.

[16]  E. Guney,et al.  iFrag: A Protein-Protein Interface Prediction Server Based on Sequence Fragments. , 2017, Journal of molecular biology.

[17]  Nevena Veljkovic,et al.  TRI_tool: a web-tool for prediction of protein–protein interactions in human transcriptional regulation , 2017, Bioinform..

[18]  Jijun Tang,et al.  Predicting protein-protein interactions via multivariate mutual information of protein sequences , 2016, BMC Bioinformatics.

[19]  B. Rost,et al.  Evolutionary profiles improve protein-protein interaction prediction from sequence , 2015, Bioinform..

[20]  Burkhard Rost,et al.  More challenges for machine-learning protein interactions , 2015, Bioinform..

[21]  Kenji Mizuguchi,et al.  Homology-based prediction of interactions between proteins using Averaged One-Dependence Estimators , 2014, BMC Bioinformatics.

[22]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[23]  Dmitrij Frishman,et al.  Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis , 2013, Nucleic Acids Res..

[24]  Baldomero Oliva,et al.  iLoops: a protein-protein interaction prediction server based on structural features , 2013, Bioinform..

[25]  René Staritzbichler,et al.  Alignment of Helical Membrane Protein Sequences Using AlignMe , 2013, PloS one.

[26]  Lei Deng,et al.  PrePPI: a structure-informed database of protein–protein interactions , 2012, Nucleic Acids Res..

[27]  E. Marcotte,et al.  A flaw in the typical evaluation scheme for pair-input computational predictions , 2012, Nature Methods.

[28]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[29]  Bin Liu,et al.  SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners , 2012, PloS one.

[30]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[31]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[32]  Xue-wen Chen,et al.  KUPS: constructing datasets of interacting and non-interacting protein pairs with associated attributions , 2010, Nucleic Acids Res..

[33]  Hongbin Shen,et al.  Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features. , 2010, Journal of proteome research.

[34]  Bonnie Berger,et al.  Struct2Net: a web service to predict protein–protein interactions using a structure-based approach , 2010, Nucleic Acids Res..

[35]  Jens Meiler,et al.  A unified hydrophobicity scale for multispan membrane proteins , 2009, Proteins.

[36]  Jens Meiler,et al.  Solvent accessible surface area approximations for rapid and accurate protein structure prediction , 2009, Journal of molecular modeling.

[37]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[38]  Geoffrey J. Barton,et al.  PIPs: human protein–protein interaction prediction database , 2008, Nucleic Acids Res..

[39]  K. S. Deshpande,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[40]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[41]  Christopher W. V. Hogue,et al.  Structure-Templated Predictions of Novel Protein Interactions from Sequence Information , 2007, PLoS Comput. Biol..

[42]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[43]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[44]  Mark Pagel,et al.  Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[45]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[46]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[47]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[48]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[49]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[50]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[51]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[52]  D. Eisenberg,et al.  Analysis of membrane and surface protein sequences with the hydrophobic moment plot. , 1984, Journal of molecular biology.

[53]  M. Charton,et al.  The structural dependence of amino acid hydrophobicity parameters. , 1982, Journal of theoretical biology.

[54]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[55]  A. Komoriya,et al.  Local interactions as a structure determinant for protein molecules: III. , 1979, Biochimica et biophysica acta.

[56]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[57]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[58]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[59]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[60]  David A. Gough,et al.  Whole-proteome interaction mining , 2003, Bioinform..

[61]  L. Breiman Random Forests , 2001, Machine Learning.

[62]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..