Predicting Hot Spots Using a Deep Neural Network Approach

Targeting protein-protein interactions is a challenge and crucial task of the drug discovery process. A good starting point for rational drug design is the identification of hot spots (HS) at protein-protein interfaces, typically conserved residues that contribute most significantly to the binding. In this chapter, we depict point-by-point an in-house pipeline used for HS prediction using only sequence-based features from the well-known SpotOn dataset of soluble proteins (Moreira et al., Sci Rep 7:8007, 2017), through the implementation of a deep neural network. The presented pipeline is divided into three steps: (1) feature extraction, (2) deep learning classification, and (3) model evaluation. We present all the available resources, including code snippets, the main dataset, and the free and open-source modules/packages necessary for full replication of the protocol. The users should be able to develop an HS prediction model with accuracy, precision, recall, and AUROC of 0.96, 0.93, 0.91, and 0.86, respectively.

[1]  Hao Wang,et al.  Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting , 2018, Scientific Reports.

[2]  Jason E Gestwicki,et al.  Inhibitors of protein-protein interactions (PPIs): an analysis of scaffold choices and buried surface area. , 2018, Current opinion in chemical biology.

[3]  Alexandre M J J Bonvin,et al.  SpotOn: High Accuracy Identification of Protein-Protein Interface Hot-Spots , 2017, Scientific Reports.

[4]  Pedro A Fernandes,et al.  Hot spots—A review of the protein–protein interface determinant amino‐acid residues , 2007, Proteins.

[5]  Irina S. Moreira,et al.  A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces , 2016, International journal of molecular sciences.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[8]  Colin Raffel,et al.  Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[9]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[10]  Robert J. Cox,et al.  A Method for Optimal Division of Data Sets for Use in Neural Networks , 2005, KES.

[11]  Igor Jurisica,et al.  IID 2018 update: context-specific physical protein–protein interactions in human, model organisms and domesticated species , 2018, Nucleic Acids Res..

[12]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[13]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[14]  Christoph Sommer,et al.  Machine learning in cell biology – teaching computers to recognize phenotypes , 2013, Journal of Cell Science.

[15]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[16]  Rafael C. Gonzalez,et al.  Deep Convolutional Neural Networks [Lecture Notes] , 2018, IEEE Signal Processing Magazine.

[17]  K. Lage Protein-protein interactions and genetic diseases: The interactome. , 2014, Biochimica et biophysica acta.

[18]  David C Fry,et al.  Targeting protein-protein interactions for drug discovery. , 2015, Methods in molecular biology.

[19]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[20]  Burkhard Rost,et al.  ISIS: interaction sites identified from sequence , 2007, Bioinform..