Sequence‐based prediction of protein–peptide binding sites using support vector machine

Protein–peptide interactions are essential for all cellular processes including DNA repair, replication, gene‐expression, and metabolism. As most protein–peptide interactions are uncharacterized, it is cost effective to investigate them computationally as the first step. All existing approaches for predicting protein–peptide binding sites, however, are based on protein structures despite the fact that the structures for most proteins are not yet solved. This article proposes the first machine‐learning method called SPRINT to make Sequence‐based prediction of Protein–peptide Residue‐level Interactions. SPRINT yields a robust and consistent performance for 10‐fold cross validations and independent test. The most important feature is evolution‐generated sequence profiles. For the test set (1056 binding and non‐binding residues), it yields a Matthews’ Correlation Coefficient of 0.326 with a sensitivity of 64% and a specificity of 68%. This sequence‐based technique shows comparable or more accurate than structure‐based methods for peptide‐binding site prediction. SPRINT is available as an online server at: http://sparks-lab.org/. © 2016 Wiley Periodicals, Inc.

[1]  Hasup Lee,et al.  GalaxyPepDock: a protein–peptide docking tool based on interaction similarity and energy optimization , 2015, Nucleic Acids Res..

[2]  X. Zou,et al.  Predicting peptide binding sites on protein surfaces by clustering chemical interactions , 2015, J. Comput. Chem..

[3]  Pierre Tufféry,et al.  PEP-SiteFinder: a tool for the blind identification of peptide binding sites on protein surfaces , 2014, Nucleic Acids Res..

[4]  Dima Kozakov,et al.  Detection of peptide‐binding sites on protein surfaces: The first step toward the modeling and targeting of peptide‐mediated interactions , 2013, Proteins.

[5]  Shanfeng Zhu,et al.  MHC2SKpan: a novel kernel based approach for pan-specific MHC class II peptide binding prediction , 2013, BMC Genomics.

[6]  R. Backofen,et al.  Semi-Supervised Prediction of SH2-Peptide Interactions from Imbalanced High-Throughput Data , 2013, PloS one.

[7]  L. Serrano,et al.  Protein-peptide complex prediction through fragment interaction patterns. , 2013, Structure.

[8]  Amy E Keating,et al.  Peptide ligands for pro-survival protein Bfl-1 from computationally guided library screening. , 2013, ACS chemical biology.

[9]  Yang Zhang,et al.  BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..

[10]  D. Gfeller Uncovering new aspects of protein interactions through analysis of specificity landscapes in peptide recognition domains , 2012, FEBS letters.

[11]  Haim J. Wolfson,et al.  PepCrawler: a fast RRT-based algorithm for high-resolution refinement and binding affinity estimation of peptide inhibitors , 2011, Bioinform..

[12]  Gabriele Ausiello,et al.  Identification of binding pockets in protein structures using a knowledge-based potential derived from local structural similarities , 2011, BMC Bioinformatics.

[13]  Nir London,et al.  Rosetta FlexPepDock ab-initio: Simultaneous Folding, Docking and Refinement of Peptides onto Their Receptors , 2011, PloS one.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Tuo Zhang,et al.  Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility. , 2010, Current protein & peptide science.

[16]  A. Bonvin,et al.  The HADDOCK web server for data-driven biomolecular docking , 2010, Nature Protocols.

[17]  O. Schueler‐Furman,et al.  The structural basis of peptide-protein binding strategies. , 2010, Structure.

[18]  Morten Nielsen,et al.  The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding , 2009, Bioinform..

[19]  Wei Zhang,et al.  Characterization of Domain-Peptide Interaction Interface , 2009, Molecular & Cellular Proteomics.

[20]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[21]  Eduardo Garcia Urdiales,et al.  Accurate Prediction of Peptide Binding Sites on Protein Surfaces , 2009, PLoS Comput. Biol..

[22]  R. Russell,et al.  Peptide-mediated interactions in biological systems: new discoveries and applications. , 2008, Current opinion in biotechnology.

[23]  Niall J. Haslam,et al.  Understanding eukaryotic linear motifs and their role in cell signaling and regulation. , 2008, Frontiers in bioscience : a journal and virtual library.

[24]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[25]  Song Liu,et al.  Protein binding site prediction using an empirical scoring function , 2006, Nucleic acids research.

[26]  Johannes Söding,et al.  The MPI Bioinformatics Toolkit for protein sequence analysis , 2006, Nucleic Acids Res..

[27]  Harel Weinstein,et al.  A flexible docking procedure for the exploration of peptide binding selectivity to known structures and homology models of PDZ domains. , 2005, Journal of the American Chemical Society.

[28]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[29]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[30]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[31]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[32]  M. Khrestchatisky,et al.  Synthetic therapeutic peptides: science and market. , 2010, Drug discovery today.

[33]  Yue-Shi Lee,et al.  Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset , 2006 .

[34]  Tal Pupko,et al.  Structural Genomics , 2005 .

[35]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[36]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[37]  István Simon,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm035 Structural bioinformatics Local structural disorder imparts plasticity on linear motifs , 2022 .

[38]  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Characterization of local geometry of protein surfaces with the visibility criterion , 2022 .