A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces

Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.

[1]  A M Lesk,et al.  Interior and surface of monomeric proteins. , 1987, Journal of molecular biology.

[2]  Keith Brew,et al.  Crystal Structure of the Catalytic Domain of Matrix Metalloproteinase-1 in Complex with the Inhibitory Domain of Tissue Inhibitor of Metalloproteinase-1* , 2006, Journal of Biological Chemistry.

[3]  Jean-Michel Claverie,et al.  Structure and evolution of the Ivy protein family, unexpected lysozyme inhibitors in Gram-negative bacteria , 2007, Proceedings of the National Academy of Sciences.

[4]  M. Natália D. S. Cordeiro,et al.  Solvent Accessible Surface Area-Based Hot-Spot Detection Methods for Protein-Protein and Protein-Nucleic Acid Interfaces , 2015, J. Chem. Inf. Model..

[5]  Jerónimo Bravo,et al.  The leukemia-associated AML1 (Runx1)–CBFβ complex functions as a DNA-induced molecular clamp , 2001, Nature Structural Biology.

[6]  K. Acharya,et al.  Molecular recognition of human angiogenin by placental ribonuclease inhibitor—an X‐ray crystallographic study at 2.0 Å resolution , 1997, The EMBO journal.

[7]  J. Kraut,et al.  Crystal structure of a complex between electron transfer partners, cytochrome c peroxidase and cytochrome c. , 1993, Science.

[8]  P. Reinemer,et al.  Crystal Structure of the Interleukin-4/Receptor α Chain Complex Reveals a Mosaic Binding Interface , 1999, Cell.

[9]  Xiang-Sun Zhang,et al.  Prediction of hot spots in protein interfaces using a random forest model with hybrid features. , 2012, Protein engineering, design & selection : PEDS.

[10]  Qinghua Zhou,et al.  A site-directed mutagenesis method particularly useful for creating otherwise difficult-to-make mutants and alanine scanning. , 2012, Analytical biochemistry.

[11]  T. Hynes,et al.  Crystal structures of bovine chymotrypsin and trypsin complexed to the inhibitor domain of alzheimer's amyloid β‐protein precursor (APPI) and basic pancreatic trypsin inhibitor (BPTI): Engineering of inhibitors with altered specificities , 1997, Protein science : a publication of the Protein Society.

[12]  R. Poljak,et al.  Crystal structure of an Fv-Fv idiotope-anti-idiotope complex at 1.9 A resolution. , 1996, Journal of molecular biology.

[13]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[14]  S. Burley,et al.  Crystal structure of the MazE/MazF complex: molecular bases of antidote-toxin recognition. , 2003, Molecular cell.

[15]  R. St Charles,et al.  Structure of extracellular tissue factor complexed with factor VIIa inhibited with a BPTI mutant. , 1999, Journal of molecular biology.

[16]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[17]  M Welch,et al.  Further insights into the mechanism of function of the response regulator CheY from crystallographic studies of the CheY--CheA(124--257) complex. , 2001, Acta crystallographica. Section D, Biological crystallography.

[18]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[19]  S. Smith‐Gill,et al.  Three-dimensional structures of the free and antigen-bound Fab from monoclonal antilysozyme antibody HyHEL-63(,). , 2000, Biochemistry.

[20]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[21]  Salam A. Assi,et al.  Presaging Critical Residues in Protein interfaces-Web Server (PCRPi-W): A Web Server to Chart Hot Spots in Protein Interfaces , 2010, PloS one.

[22]  W G Laver,et al.  The structure of a complex between the NC10 antibody and influenza virus neuraminidase and comparison with the overlapping binding site of the NC41 antibody. , 1994, Structure.

[23]  W. Sundquist,et al.  Ubiquitin recognition by the human TSG101 protein. , 2004, Molecular cell.

[24]  D. Shaw,et al.  Structural basis of interaction between urokinase-type plasminogen activator and its receptor. , 2006, Journal of molecular biology.

[25]  Alexandre M. J. J. Bonvin,et al.  CPORT: A Consensus Interface Predictor and Its Performance in Prediction-Driven Docking with HADDOCK , 2011, PloS one.

[26]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[27]  Jinyan Li,et al.  Integrating water exclusion theory into βcontacts to predict binding free energy changes and binding hot spots , 2013, BMC Bioinformatics.

[28]  Daniel Ricklin,et al.  A structural basis for complement inhibition by Staphylococcus aureus , 2007, Nature Immunology.

[29]  Pedro Alexandrino Fernandes,et al.  Computational alanine scanning mutagenesis—An improved methodological approach , 2007, J. Comput. Chem..

[30]  Jack F. Kirsch,et al.  Structural and Computational Characterization of the SHV-1 β-Lactamase-β-Lactamase Inhibitor Protein Interface* , 2006, Journal of Biological Chemistry.

[31]  Huan-Xiang Zhou,et al.  Prediction of interface residues in protein–protein complexes by a consensus neural network method: Test against NMR data , 2005, Proteins.

[32]  J. Deisenhofer Crystallographic refinement and atomic models of a human Fc fragment and its complex with fragment B of protein A from Staphylococcus aureus at 2.9- and 2.8-A resolution. , 1981, Biochemistry.

[33]  J. Wells,et al.  High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. , 1989, Science.

[34]  D. Goldenberg,et al.  Rigidification of a flexible protease inhibitor variant upon binding to trypsin. , 2007, Journal of molecular biology.

[35]  Jinyan Li,et al.  Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences , 2013, Proteins.

[36]  C. Craik,et al.  Structure of an Fab-protease complex reveals a highly specific non-canonical mechanism of inhibition. , 2008, Journal of molecular biology.

[37]  Mahua Ghosh,et al.  The Nuclease A-Inhibitor Complex Is Characterized by a Novel Metal Ion Bridge* , 2007, Journal of Biological Chemistry.

[38]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[39]  S. P. Walton,et al.  Latest developments in experimental and computational approaches to characterize protein–lipid interactions , 2012, Proteomics.

[40]  D. Baker,et al.  The structural and energetic basis for high selectivity in a high-affinity protein-protein interaction , 2010, Proceedings of the National Academy of Sciences.

[41]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[42]  G. Cohen,et al.  Structure of an antibody-antigen complex: crystal structure of the HyHEL-10 Fab-lysozyme complex. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[43]  R. Huber,et al.  X‐ray crystal structure of the complex of human leukocyte elastase (PMN elastase) and the third domain of the turkey ovomucoid inhibitor. , 1986, The EMBO journal.

[44]  Masahiro Kinoshita,et al.  Crucial importance of the water-entropy effect in predicting hot spots in protein-protein complexes. , 2011, Physical chemistry chemical physics : PCCP.

[45]  J. Deisenhofer,et al.  A structural basis of the interactions between leucine-rich repeats and protein ligands , 1995, Nature.

[46]  Seren Soner,et al.  Hot Spots in a Network of Functional Sites , 2013, PloS one.

[47]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[48]  Hongmin Li,et al.  Crystal structure of a T-cell receptor β-chain complexed with a superantigen , 1996, Nature.

[49]  J. Sodroski,et al.  Structure of an HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing human antibody , 1998, Nature.

[50]  David W. Banner,et al.  The crystal structure of the complex of blood coagulation factor VIIa with soluble tissue factor , 1996, Nature.

[51]  Shashank Deep,et al.  Crystal structure of the human TβR2 ectodomain–TGF-β3 complex , 2002, Nature Structural Biology.

[52]  Irina S Moreira The Role of Water Occlusion for the Definition of a Protein Binding Hot-Spot. , 2015, Current topics in medicinal chemistry.

[53]  T. Bhat,et al.  Bound water molecules and conformational stabilization help mediate an antigen-antibody association. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Sun-Shin Cha,et al.  Analyses of Mlc–IIBGlc interaction and a plausible molecular mechanism of Mlc inactivation by membrane sequestration , 2008, Proceedings of the National Academy of Sciences.

[55]  P. Kollman,et al.  Computational Alanine Scanning To Probe Protein−Protein Interactions: A Novel Approach To Evaluate Binding Free Energies , 1999 .

[56]  Richard W. Farndale,et al.  Structural Basis of Collagen Recognition by Integrin α2β1 , 2000, Cell.

[57]  Pinak Chakrabarti,et al.  PRICE (PRotein Interface Conservation and Energetics): a server for the analysis of protein–protein interfaces , 2011, Journal of Structural and Functional Genomics.

[58]  Burkhard Rost,et al.  Comprehensive in silico mutagenesis highlights functionally important residues in proteins , 2008, ECCB.

[59]  Julie C. Mitchell,et al.  KFC2: A knowledge‐based hot spot prediction method based on interface solvation, atomic density, and plasticity features , 2011, Proteins.

[60]  Julie C. Mitchell,et al.  KFC Server: interactive forecasting of protein interaction hot spots , 2008, Nucleic Acids Res..

[61]  Ronald T Raines,et al.  Inhibition of human pancreatic ribonuclease by the human ribonuclease inhibitor protein. , 2005, Journal of molecular biology.

[62]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[63]  Juan Fernández-Recio,et al.  SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models , 2012, Bioinform..

[64]  D. Bailey,et al.  The Binding Interface Database (BID): A Compilation of Amino Acid Hot Spots in Protein Interfaces , 2003, Bioinform..

[65]  R. Timpl,et al.  Structural basis for the high‐affinity interaction of nidogen‐1 with immunoglobulin‐like domain 3 of perlecan , 2001, The EMBO journal.

[66]  Cyrus Chothia,et al.  The accessible surface area and stability of oligomeric proteins , 1987, Nature.

[67]  L. Prasad,et al.  The 2.5 A resolution structure of the jel42 Fab fragment/HPr complex. , 1998, Journal of molecular biology.

[68]  Robert Huber,et al.  Structural basis for the anticoagulant activity of the thrombin–thrombomodulin complex , 2000, Nature.

[69]  Shuigeng Zhou,et al.  Boosting Prediction Performance of Protein-Protein Interaction Hot Spots by Using Structural Neighborhood Properties , 2013, J. Comput. Biol..

[70]  A. Fersht,et al.  Protein-protein recognition: crystal structural analysis of a barnase-barstar complex at 2.0-A resolution. , 1994, Biochemistry.

[71]  A. Kotzsch,et al.  Crystal structure analysis reveals how the Chordin family member crossveinless 2 blocks BMP-2 receptor binding. , 2008, Developmental cell.

[72]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[73]  S. Fields,et al.  Protein-protein interactions: methods for detection and analysis , 1995, Microbiological reviews.

[74]  Doheon Lee,et al.  A feature-based approach to modeling protein–protein interaction hot spots , 2009, Nucleic acids research.

[75]  T. Kirsch,et al.  Crystal structure of the BMP-2–BRIA ectodomain complex , 2000, Nature Structural Biology.

[76]  U. C. Kuhlmann Crystal Structure of the E.Coli Colicin E9 DNase Domain With its Cognate Immunity Protein Im9 , 1999 .

[77]  Bin Xu,et al.  A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces , 2012, BMC Systems Biology.

[78]  W. Sundquist,et al.  Crystal Structure of Human Cyclophilin A Bound to the Amino-Terminal Domain of HIV-1 Capsid , 1996, Cell.

[79]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[80]  Lan Huang,et al.  Structural basis for the interaction of Ras with RaIGDS , 1998, Nature Structural Biology.

[81]  M. Michael Gromiha,et al.  PINT: Protein–protein Interactions Thermodynamic Database , 2005, Nucleic Acids Res..

[82]  Sanjana Sudarshan,et al.  Protein-Protein Interface Detection Using the Energy Centrality Relationship (ECR) Characteristic of Proteins , 2014, PloS one.

[83]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[84]  T S Edgington,et al.  The mechanism of an inhibitory antibody on TF-initiated blood coagulation revealed by the crystal structures of human tissue factor, Fab 5G9 and TF.G9 complex. , 1998, Journal of molecular biology.

[85]  Randy J. Read,et al.  Crystal and molecular structures of the complex of α-chymotrypsin with its inhibitor Turkey ovomucoid third domain at 1.8 Å resolution , 1987 .

[86]  Robert Huber,et al.  Catalytic Domain Structures of MT-SP1/Matriptase, a Matrix-degrading Transmembrane Serine Proteinase* , 2002, The Journal of Biological Chemistry.

[87]  Gene Kwan,et al.  Binding, proteolytic, and crystallographic analyses of mutations at the protease-inhibitor interface of the subtilisin BPN'/chymotrypsin inhibitor 2 complex. , 2004, Biochemistry.

[88]  J. Martins,et al.  Solvent‐accessible surface area: How well can be applied to hot‐spot detection? , 2014, Proteins.

[89]  C. Schutt,et al.  The structure of crystalline profilin-beta-actin. , 1993, Nature.

[90]  Mainak Guharoy,et al.  Empirical estimation of the energetic contribution of individual interface residues in structures of protein–protein complexes , 2009, J. Comput. Aided Mol. Des..

[91]  R J Read,et al.  Structure of the complex of Streptomyces griseus protease B and the third domain of the turkey ovomucoid inhibitor at 1.8-A resolution. , 1983, Biochemistry.

[92]  G J Kleywegt,et al.  Crystal structure of the C2 fragment of streptococcal protein G in complex with the Fc domain of human IgG. , 1995, Structure.

[93]  Wim J. N. Meester,et al.  Structure of the Ubiquitin Hydrolase UCH-L3 Complexed with a Suicide Substrate* , 2005, Journal of Biological Chemistry.

[94]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[95]  Tal Pupko,et al.  ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids , 2010, Nucleic Acids Res..

[96]  Kenji Ogura,et al.  Novel recognition mode between Vav and Grb2 SH3 domains , 2001, The EMBO journal.

[97]  Aida Baharuddin,et al.  A reverse binding motif that contributes to specific protease inhibition by antibodies. , 2012, Journal of molecular biology.

[98]  G. Marius Clore,et al.  Improving the Packing and Accuracy of NMR Structures with a Pseudopotential for the Radius of Gyration , 1999 .