Review and comparative assessment of sequence‐based predictors of protein‐binding residues

&NA; Understanding of molecular mechanisms that govern protein‐protein interactions and accurate modeling of protein‐protein docking rely on accurate identification and prediction of protein‐binding partners and protein‐binding residues. We review over 40 methods that predict protein‐protein interactions from protein sequences including methods that predict interacting protein pairs, protein‐binding residues for a pair of interacting sequences and protein‐binding residues in a single protein chain. We focus on the latter methods that provide residue‐level annotations and that can be broadly applied to all protein sequences. We compare their architectures, inputs and outputs, and we discuss aspects related to their assessment and availability. We also perform first‐of‐its‐kind comprehensive empirical comparison of representative predictors of protein‐binding residues using a novel and high‐quality benchmark data set. We show that the selected predictors accurately discriminate protein‐binding and non‐binding residues and that newer methods outperform older designs. However, these methods are unable to accurately separate residues that bind other molecules, such as DNA, RNA and small ligands, from the protein‐binding residues. This cross‐prediction, defined as the incorrect prediction of nucleic‐acid‐ and small‐ligand‐binding residues as protein binding, is substantial for all evaluated methods and is not driven by the proximity to the native protein‐binding residues. We discuss reasons for this drawback and we offer several recommendations. In particular, we postulate the need for a new generation of more accurate predictors and data sets, inclusion of a comprehensive assessment of the cross‐predictions in future studies and higher standards of availability of the published methods.

[1]  Zhu-Hong You,et al.  Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest , 2015, PloS one.

[2]  Jun Hu,et al.  Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble , 2014, BMC Bioinformatics.

[3]  J H Jia,et al.  Prediction of protein-protein interactions using chaos game representation and wavelet transform via the random forest algorithm. , 2015, Genetics and molecular research : GMR.

[4]  De-Shuang Huang,et al.  Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset , 2010, Amino Acids.

[5]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[6]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[7]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[8]  Xiuquan Du,et al.  Improved Prediction of Protein Binding Sites from Sequences Using Genetic Algorithm , 2009, The protein journal.

[9]  Jing-Yu Yang,et al.  Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests , 2016, Neurocomputing.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Olivier Sperandio Editorial: Toward the design of drugs on protein-protein interactions. , 2012, Current pharmaceutical design.

[12]  Daniel R. Caffrey,et al.  Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? , 2004, Protein science : a publication of the Protein Society.

[13]  Jean-Christophe Nebel,et al.  Progress and challenges in predicting protein interfaces , 2015, Briefings Bioinform..

[14]  Burkhard Rost,et al.  ISIS: interaction sites identified from sequence , 2007, Bioinform..

[15]  Zhiping Weng,et al.  Evaluating template-based and template-free protein-protein complex structure prediction , 2014, Briefings Bioinform..

[16]  J. Rodrigues,et al.  Integrative computational modeling of protein interactions , 2014, The FEBS journal.

[17]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[18]  Jun Hu,et al.  TargetATPsite: A template‐free method for ATP‐binding sites prediction with residue evolution image sparse representation and classifier ensemble , 2013, J. Comput. Chem..

[19]  Zhu-Hong You,et al.  Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis , 2013, BMC Bioinformatics.

[20]  Zhu-Hong You,et al.  Detecting Protein-Protein Interactions with a Novel Matrix-Based Protein Sequence Representation and Support Vector Machines , 2015, BioMed research international.

[21]  Chen Xu,et al.  Computational prediction of DNA-protein interactions: a review. , 2010, Current computer-aided drug design.

[22]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[23]  David W Ritchie,et al.  Recent progress and future directions in protein-protein docking. , 2008, Current protein & peptide science.

[24]  L. Castagnoli,et al.  mentha: a resource for browsing integrated protein-interaction networks , 2013, Nature Methods.

[25]  Menglong Li,et al.  PRED_PPI: a server for predicting protein-protein interactions based on sequence data with probability assignment , 2010, BMC Research Notes.

[26]  Yong Zhou,et al.  Ens-PPI: A Novel Ensemble Classifier for Predicting the Interactions of Proteins Using Autocovariance Transformation from PSSM , 2016, BioMed research international.

[27]  Zhu-Hong You,et al.  Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences , 2016, BioMed research international.

[28]  Keith C. C. Chan,et al.  Discovering Variable-Length Patterns in Protein Sequences for Protein-Protein Interaction Prediction , 2015, IEEE Transactions on NanoBioscience.

[29]  Alan Wee-Chung Liew,et al.  Sequence‐based prediction of protein–peptide binding sites using support vector machine , 2016, J. Comput. Chem..

[30]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[31]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[32]  Vasant G Honavar,et al.  Computational prediction of protein interfaces: A review of data driven methods , 2015, FEBS letters.

[33]  J. Bujnicki,et al.  Computational methods for prediction of protein-RNA interactions. , 2012, Journal of structural biology.

[34]  Yang Zhang,et al.  BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..

[35]  Vasant Honavar,et al.  HomPPI: a class of sequence homology based protein-protein interface prediction methods , 2011, BMC Bioinformatics.

[36]  Zhen Ji,et al.  Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set , 2014, BMC Bioinformatics.

[37]  Keehyoung Joo,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS SANN: Solvent accessibility prediction of proteins , 2022 .

[38]  Hong-Bin Shen,et al.  Prediction of Protein–Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures , 2015, The Journal of Membrane Biology.

[39]  Naoki Orii,et al.  Wiki-Pi: A Web-Server of Annotated Human Protein-Protein Interactions to Aid in Discovery of Protein Function , 2012, PloS one.

[40]  Alfonso Valencia,et al.  Progress and challenges in predicting protein-protein interaction sites , 2008, Briefings Bioinform..

[41]  Paolo Frasconi,et al.  Predicting Metal-Binding Sites from Protein Sequence , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  Tobias Hamp,et al.  Sequence-based prediction of protein-protein interactions , 2014 .

[43]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[44]  Yang Zhang,et al.  Protein-protein complex structure predictions by multimeric threading and template recombination. , 2011, Structure.

[45]  Lukasz A. Kurgan,et al.  DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences , 2016, Bioinform..

[46]  Petras J. Kundrotas,et al.  Accuracy of Protein-Protein Binding Sites in High-Throughput Template-Based Modeling , 2010, PLoS Comput. Biol..

[47]  Burkhard Rost,et al.  Evolutionary profiles improve protein-protein interaction prediction from sequence , 2015, Bioinform..

[48]  Rod K. Nibbe,et al.  Protein–protein interaction networks and subnetworks in the biology of disease , 2011, Wiley interdisciplinary reviews. Systems biology and medicine.

[49]  Ashkan Golshani,et al.  Binding Site Prediction for Protein-Protein Interactions and Novel Motif Discovery using Re-occurring Polypeptide Sequences , 2011, BMC Bioinformatics.

[50]  Vasant Honavar,et al.  Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art , 2012, BMC Bioinformatics.

[51]  Darby Tien-Hao Chang,et al.  Predicting protein-protein interactions in unbalanced data using the primary structure of proteins , 2010, BMC Bioinformatics.

[52]  Sheng-You Huang,et al.  Search strategies and evaluation in protein-protein docking: principles, advances and challenges. , 2014, Drug discovery today.

[53]  Ruth Nussinov,et al.  An overview of recent advances in structural bioinformatics of protein-protein interactions and a guide to their principles. , 2014, Progress in biophysics and molecular biology.

[54]  K. Mizuguchi,et al.  Partner-Aware Prediction of Interacting Residues in Protein-Protein Complexes from Sequence Data , 2011, PloS one.

[55]  Xing Chen,et al.  Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding , 2016, BMC Bioinformatics.

[56]  Yu Liu,et al.  Prediction of Protein-Protein Interaction Sites Based on Naive Bayes Classifier , 2015, Biochemistry research international.

[57]  Gajendra P. S. Raghava,et al.  Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information , 2013, BMC Bioinformatics.

[58]  R. Nagarajan,et al.  Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins , 2013, Nucleic acids research.

[59]  Michal Brylinski,et al.  Predicting protein interface residues using easily accessible on-line resources , 2015, Briefings Bioinform..

[60]  Christopher L. McClendon,et al.  Reaching for high-hanging fruit in drug discovery at protein–protein interfaces , 2007, Nature.

[61]  Olivier Sperandio,et al.  Editorial: [Hot Topics: Toward the Design of Drugs on Protein-Protein Interactions] , 2012 .

[62]  Philip E. Bourne,et al.  The Protein Data Bank (PDB) | NIST , 2002 .

[63]  Juan Fernández-Recio,et al.  Prediction of protein binding sites and hot spots , 2011 .

[64]  Raphael A. G. Chaleil,et al.  Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2. , 2015, Journal of molecular biology.

[65]  Abdulaziz Yousef,et al.  A novel method based on new adaptive LVQ neural network for predicting protein-protein interactions from protein sequences. , 2013, Journal of theoretical biology.

[66]  Hong Yan,et al.  Fast prediction of protein-protein interaction sites based on Extreme Learning Machines , 2014, Neurocomputing.

[67]  Kaustubh D. Dhole,et al.  SPRINGS: Prediction of Protein- Protein Interaction Sites Using Artificial Neural Networks , 2014 .

[68]  Bin Liu,et al.  SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners , 2012, PloS one.

[69]  Jing-Yu Yang,et al.  A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites , 2015, IEEE Transactions on NanoBioscience.

[70]  Jinyan Li,et al.  Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information , 2010, BMC Bioinformatics.

[71]  Lukasz Kurgan,et al.  Structural protein descriptors in 1-dimension and their sequence-based predictions. , 2011, Current protein & peptide science.

[72]  Kuo-Chen Chou,et al.  iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets , 2016, Molecules.

[73]  Zhu-Hong You,et al.  Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence , 2015, BioMed research international.

[74]  Lukasz A. Kurgan,et al.  A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues , 2016, Briefings Bioinform..

[75]  Daniel B. Roche,et al.  Proteins and Their Interacting Partners: An Introduction to Protein–Ligand Binding Site Prediction Methods , 2015, International journal of molecular sciences.

[76]  J. De las Rivas,et al.  Protein-protein interaction networks: unraveling the wiring of molecular machines within the cell. , 2012, Briefings in functional genomics.

[77]  Oriol Fornes,et al.  On the use of knowledge-based potentials for the evaluation of models of protein-protein, protein-DNA, and protein-RNA interactions. , 2014, Advances in protein chemistry and structural biology.

[78]  Reza Ebrahimpour,et al.  PPIevo: protein-protein interaction prediction from PSSM based evolutionary information. , 2013, Genomics.

[79]  Jan Tavernier,et al.  Modulation of Protein–Protein Interactions for the Development of Novel Therapeutics , 2015, Molecular therapy : the journal of the American Society of Gene Therapy.

[80]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[81]  Bogdan Istrate,et al.  Algorithmic approaches to protein-protein interaction site prediction , 2015, Algorithms for Molecular Biology.

[82]  Bin Xia,et al.  PETs: A Stable and Accurate Predictor of Protein-Protein Interacting Sites Based on Extremely-Randomized Trees , 2015, IEEE Transactions on NanoBioscience.

[83]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[84]  En-Shiun Annie Lee,et al.  Prediction of Protein-Protein Interaction via co-occurring Aligned Pattern Clusters. , 2016, Methods.

[85]  Jun Hu,et al.  Designing Template-Free Predictor for Targeting Protein-Ligand Binding Sites with Classifier Ensemble and Spatial Clustering , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[86]  Lukasz A. Kurgan,et al.  Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors , 2012, Bioinform..

[87]  Ke Chen,et al.  Investigation of Atomic Level Patterns in Protein—Small Ligand Interactions , 2009, PloS one.

[88]  Xingming Zhao,et al.  Predicting protein–protein interactions from protein sequences using meta predictor , 2010, Amino Acids.

[89]  Lukasz Kurgan,et al.  High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder , 2015, Nucleic acids research.

[90]  Janet M Thornton,et al.  Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. , 2002, Journal of molecular biology.

[91]  Yangchao Huang,et al.  Simple sequence-based kernels do not predict protein-protein interactions , 2010, Bioinform..

[92]  Ram Samudrala,et al.  A protein sequence meta-functional signature for calcium binding residue prediction , 2010, Pattern Recognit. Lett..

[93]  A. Emili,et al.  Protein-protein interaction networks: probing disease mechanisms using model systems , 2013, Genome Medicine.

[94]  K. Kinoshita,et al.  Hub Promiscuity in Protein-Protein Interaction Networks , 2010, International journal of molecular sciences.

[95]  Tuo Zhang,et al.  Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility. , 2010, Current protein & peptide science.

[96]  Darby Tien-Hao Chang,et al.  Predicting the protein-protein interactions using primary structures with predicted protein surface , 2010, BMC Bioinformatics.

[97]  Hong-Bin Shen,et al.  Adaptive compressive learning for prediction of protein-protein interactions from primary sequence. , 2011, Journal of theoretical biology.