Mutation effect estimation on protein–protein interactions using deep contextualized representation learning

The functional impact of protein mutations is reflected on the alteration of conformation and thermodynamics of protein-protein interactions (PPIs). Quantifying the changes of two interacting proteins upon mutations are commonly carried out by computational approaches. Hence, extensive research efforts have been put to the extraction of energetic or structural features on proteins, followed by statistical learning methods to estimate the effects of mutations to PPI properties. Nonetheless, such features require extensive human labors and expert knowledge to obtain, and have limited abilities to reflect point mutations. We present an end-to-end deep learning framework, MuPIPR, to estimate the effects of mutations on PPIs. MuPIPR incorporates a contextualized representation mechanism of amino acids to propagate the effects of a point mutation to surrounding amino acid representations, therefore amplifying the subtle change in a long protein sequence. On top of that, MuPIPR leverages a Siamese residual recurrent convolutional neural encoder to encode a wildtype protein pair and its mutation pair. Multiple-layer perceptron regressors are applied to the protein pair representations to predict the quantifiable changes of PPI properties upon mutations. Experimental evaluations show that MuPIPR outperforms various state-of-the-art systems on the change of binding affinity prediction and the buried surface area prediction. The software implementation is available at https://github.com/guangyu-zhou/MuPIPR

[1]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[2]  Ole Winther,et al.  NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning , 2018, bioRxiv.

[3]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[4]  Douglas E. V. Pires,et al.  mCSM: predicting the effects of mutations in proteins using graph-based signatures , 2013, Bioinform..

[5]  Chao-Yie Yang,et al.  Targeting the MDM2-p53 Protein-Protein Interaction for New Cancer Therapy: Progress and Challenges. , 2017, Cold Spring Harbor perspectives in medicine.

[6]  Hongyi Zhou,et al.  A physical reference state unifies the structure‐derived potential of mean force for protein folding and binding , 2004, Proteins.

[7]  Carlo Zaniolo,et al.  Multifaceted protein–protein interaction prediction based on Siamese residual RCNN , 2019, Bioinform..

[8]  Juan Fernández-Recio,et al.  SKEMPI 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation , 2018, bioRxiv.

[9]  E. Freire,et al.  Direct measurement of protein binding energetics by isothermal titration calorimetry. , 2001, Current opinion in structural biology.

[10]  Homme W Hellinga,et al.  Quantitation of protein–protein interactions by thermal stability shift analysis , 2011, Protein science : a publication of the Protein Society.

[11]  Huan‐Xiang Zhou,et al.  Prediction of solvent accessibility and sites of deleterious mutations from protein sequence , 2005, Nucleic acids research.

[12]  Gert Vriend,et al.  Everyday , 2020, Oxford Research Encyclopedia of Literature.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Bernardo Ochoa-Montaño,et al.  Mutations at protein-protein interfaces: Small changes over big surfaces have large impacts on human health. , 2017, Progress in biophysics and molecular biology.

[15]  M Lorch,et al.  Effects of mutations on the thermodynamics of a protein folding reaction: implications for the mechanism of formation of the intermediate and transition states. , 2000, Biochemistry.

[16]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[17]  Ning Chen,et al.  Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding , 2017, Bioinform..

[18]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[19]  Behnam Neyshabur,et al.  Predicting protein‐protein interactions through sequence‐based deep learning , 2018, Bioinform..

[20]  M. J. Parker,et al.  Effects of core mutations on the folding of a beta-sheet protein: implications for backbone organization in the I-state. , 1999, Biochemistry.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  Alexandre M J J Bonvin,et al.  iSEE: Interface structure, evolution, and energy‐based machine learning predictor of binding affinity changes upon mutations , 2019, Proteins.

[24]  Juan Fernández-Recio,et al.  SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models , 2012, Bioinform..

[25]  Gisbert Schneider,et al.  Recurrent Neural Network Model for Constructive Peptide Design , 2018, J. Chem. Inf. Model..

[26]  Maricel G. Kann,et al.  Chapter 4: Protein Interactions and Disease , 2012, PLoS Comput. Biol..

[27]  Klaus-Peter Zimmer,et al.  Compound heterozygous mutations affect protein folding and function in patients with congenital sucrase-isomaltase deficiency. , 2009, Gastroenterology.

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[30]  Joseph A Loo,et al.  Investigation of stable and transient protein–protein interactions: Past, present, and future , 2013, Proteomics.

[31]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[32]  Carlo Zaniolo,et al.  Neural Article Pair Modeling for Wikipedia Sub-article Matching , 2018, ECML/PKDD.

[33]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[34]  Bonnie Berger,et al.  Learning protein sequence embeddings using information from structure , 2019, ICLR.

[35]  Mike C. Payne,et al.  Thermodynamic Properties of Water Molecules at a Protein–Protein Interaction Surface , 2011, Journal of chemical theory and computation.

[36]  Yang Zhang,et al.  Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles , 2015, PLoS Comput. Biol..

[37]  Xiujun Gong,et al.  Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences , 2018, Molecules.

[38]  Giulio Superti-Furga,et al.  Protein interaction networks in innate immunity. , 2013, Trends in immunology.

[39]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[40]  Hong-Bin Shen,et al.  Predicting RNA‐protein binding sites and motifs through combining local and global deep convolutional neural networks , 2018, Bioinform..

[41]  Marianne Rooman,et al.  BeAtMuSiC: prediction of changes in protein–protein binding affinity on mutations , 2013, Nucleic Acids Res..

[42]  Wei Zheng,et al.  BindProfX: Assessing Mutation-Induced Binding Affinity Change by Protein Interface Profiles with Pseudo-Counts. , 2017, Journal of molecular biology.

[43]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[44]  Anna R. Panchenko,et al.  MutaBind estimates and interprets the effects of sequence variants on protein–protein interactions , 2016, Nucleic Acids Res..

[45]  M. Kann,et al.  PROTEIN INTERACTIONS AND DISEASE , 2006 .

[46]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[47]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[48]  Benjamin A. Shoemaker,et al.  Exploring Protein-Protein Interactions as Drug Targets for Anti-cancer Therapy with In Silico Workflows. , 2017, Methods in molecular biology.

[49]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[50]  Marco Punta,et al.  PROTEIN INTERACTIONS AND DISEASE , 2007 .

[51]  B. L. de Groot,et al.  Predicting free energy changes using structural ensembles. , 2009, Nature methods.

[52]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[53]  Ge-Fei Hao,et al.  Structure-based methods for predicting target mutation-induced drug resistance and rational drug design to overcome the problem. , 2012, Drug discovery today.