Predicting protein-peptide binding sites with a Deep Convolutional Neural Network.

MOTIVATION Interactions between proteins and peptides influence biological functions. Predicting such bio-molecular interactions can lead to faster disease prevention and help in drug discovery. Experimental methods for determining protein-peptide binding sites are costly and time-consuming. Therefore, computational methods have become prevalent. However, existing models show extremely low detection rates of actual peptide binding sites in proteins. To address this problem, we employed a two-stage technique - first, we extracted the relevant features from protein sequences and transformed them into images applying a novel method and then, we applied a convolutional neural network to identify the peptide binding sites in proteins. RESULTS We found that our approach achieves 67% sensitivity or recall (true positive rate) surpassing existing methods by over 35%.

[1]  A. Emili,et al.  Protein-protein interaction networks: probing disease mechanisms using model systems , 2013, Genome Medicine.

[2]  Daniel B. Roche,et al.  Proteins and Their Interacting Partners: An Introduction to Protein–Ligand Binding Site Prediction Methods , 2015, International journal of molecular sciences.

[3]  J. Janin,et al.  Wet and dry interfaces: the role of solvent in protein-protein and protein-DNA recognition. , 1999, Structure.

[4]  L. Castagnoli,et al.  mentha: a resource for browsing integrated protein-interaction networks , 2013, Nature Methods.

[5]  M. Khrestchatisky,et al.  Synthetic therapeutic peptides: science and market. , 2010, Drug discovery today.

[6]  Haruki Nakamura,et al.  Announcing the worldwide Protein Data Bank , 2003, Nature Structural Biology.

[7]  Alan Wee-Chung Liew,et al.  Structure‐based prediction of protein‐ peptide binding regions using Random Forest , 2018, Bioinform..

[8]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[9]  Gianni De Fabritiis,et al.  DeepSite: protein‐binding site predictor using 3D‐convolutional neural networks , 2017, Bioinform..

[10]  Harel Weinstein,et al.  A flexible docking procedure for the exploration of peptide binding selectivity to known structures and homology models of PDZ domains. , 2005, Journal of the American Chemical Society.

[11]  Jing-Yu Yang,et al.  Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests , 2016, Neurocomputing.

[12]  T. Hamelryck An amino acid has two sides: A new 2D measure provides a different view of solvent exposure , 2005, Proteins.

[13]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[14]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[15]  O. Schueler‐Furman,et al.  Detection of Peptide-Binding Sites on Protein Surfaces Using the Peptimap Server. , 2017, Methods in molecular biology.

[16]  Bin Li,et al.  Characterization of local geometry of protein surfaces with the visibility criterion , 2008, Proteins.

[17]  Aleksey A. Porollo,et al.  Linear Regression Models for Solvent Accessibility Prediction in Proteins , 2005, J. Comput. Biol..

[18]  Eduardo Garcia Urdiales,et al.  Accurate Prediction of Peptide Binding Sites on Protein Surfaces , 2009, PLoS Comput. Biol..

[19]  Rod K. Nibbe,et al.  Protein–protein interaction networks and subnetworks in the biology of disease , 2011, Wiley interdisciplinary reviews. Systems biology and medicine.

[20]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[21]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[22]  Garry L Corthals,et al.  Identification of Protein Interactions Involved in Cellular Signaling , 2013, Molecular & Cellular Proteomics.

[23]  Song Liu,et al.  Protein binding site prediction using an empirical scoring function , 2006, Nucleic acids research.

[24]  D. Eisenberg,et al.  Protein interaction databases. , 2001, Current opinion in biotechnology.

[25]  Yang Zhang,et al.  BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..

[26]  Shanfeng Zhu,et al.  MHC2SKpan: a novel kernel based approach for pan-specific MHC class II peptide binding prediction , 2013, BMC Genomics.

[27]  James G. Lyons,et al.  SPIDER2: A Package to Predict Secondary Structure, Accessible Surface Area, and Main-Chain Torsional Angles by Deep Neural Networks. , 2017, Methods in molecular biology.

[28]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[29]  Tatsuhiko Tsunoda,et al.  DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture , 2019, Scientific Reports.

[30]  Alan Wee-Chung Liew,et al.  Sequence‐based prediction of protein–peptide binding sites using support vector machine , 2016, J. Comput. Chem..

[31]  Johannes Söding,et al.  The MPI Bioinformatics Toolkit for protein sequence analysis , 2006, Nucleic Acids Res..

[32]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[33]  T. Gibson,et al.  Systematic Discovery of New Recognition Peptides Mediating Protein Interaction Networks , 2005, PLoS biology.

[34]  Kaustubh D. Dhole,et al.  Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier. , 2014, Journal of theoretical biology.

[35]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[38]  Kaustubh D. Dhole,et al.  SPRINGS: Prediction of Protein- Protein Interaction Sites Using Artificial Neural Networks , 2014 .

[39]  Wei Zhang,et al.  Characterization of Domain-Peptide Interaction Interface , 2009, Molecular & Cellular Proteomics.

[40]  Jing-Yu Yang,et al.  A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites , 2015, IEEE Transactions on NanoBioscience.

[41]  Lukasz A. Kurgan,et al.  A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues , 2016, Briefings Bioinform..

[42]  R. Backofen,et al.  Semi-Supervised Prediction of SH2-Peptide Interactions from Imbalanced High-Throughput Data , 2013, PloS one.

[43]  Gunnar Rätsch,et al.  Exploiting physico-chemical properties in string kernels , 2010, BMC Bioinformatics.