Supervised Classification Applied to Hot-Spot Detection in Protein-Protein Interfaces

The identification of protein complexes and interactions is crucial for the understanding of cellular organization and machinery. Due to the high difficulty in attaining experimental data about such an important subject, computational tools and methodologies are emerging as reliable alternatives. It is especially true that Machine-Learning (ML) algorithms hold an incredible promise for protein interaction research by identifying biological relevant patterns, which accelerates our knowledge of the functional mechanism of proteins within the cells. Over the last decades the improvement of a large number of computational techniques led to significant cost decreases and, also, increases in throughput by orders of magnitude. However, there is still room for improvement as their accuracy is still far from optimal. In this work, we have developed and applied computer modelling techniques that went beyond the current state-of-the-art, leading to quantitative and reliable molecular-level predictions of Hot-Spots at protein-protein complexes. We explored the feasibility of using ML in the HS detection and compared different classifiers as well as different preprocessing conditions. Based on this evaluation, we concluded that applying the C5.0 algorithm with minor class up-sampling leads to accurate results. The overall accuracy in an independent test set demonstrated to be 0.88. Due to the theme’s relevance to the large scientific community working on structural biology, we have assembled a freely available web-server that can be found at: http://milou.science.uu.nl/cgi/servicesdevel/SPOTON/spoton/

[1]  Jinyan Li,et al.  Integrating water exclusion theory into βcontacts to predict binding free energy changes and binding hot spots , 2013, BMC Bioinformatics.

[2]  Daniel Ricklin,et al.  A structural basis for complement inhibition by Staphylococcus aureus , 2007, Nature Immunology.

[3]  T S Edgington,et al.  The mechanism of an inhibitory antibody on TF-initiated blood coagulation revealed by the crystal structures of human tissue factor, Fab 5G9 and TF.G9 complex. , 1998, Journal of molecular biology.

[4]  R. Poljak,et al.  Crystal structure of an Fv-Fv idiotope-anti-idiotope complex at 1.9 A resolution. , 1996, Journal of molecular biology.

[5]  P. Kollman,et al.  Computational Alanine Scanning To Probe Protein−Protein Interactions: A Novel Approach To Evaluate Binding Free Energies , 1999 .

[6]  M Welch,et al.  Further insights into the mechanism of function of the response regulator CheY from crystallographic studies of the CheY--CheA(124--257) complex. , 2001, Acta crystallographica. Section D, Biological crystallography.

[7]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[8]  Mainak Guharoy,et al.  Empirical estimation of the energetic contribution of individual interface residues in structures of protein–protein complexes , 2009, J. Comput. Aided Mol. Des..

[9]  S. Smith‐Gill,et al.  Three-dimensional structures of the free and antigen-bound Fab from monoclonal antilysozyme antibody HyHEL-63(,). , 2000, Biochemistry.

[10]  R J Read,et al.  Structure of the complex of Streptomyces griseus protease B and the third domain of the turkey ovomucoid inhibitor at 1.8-A resolution. , 1983, Biochemistry.

[11]  G J Kleywegt,et al.  Crystal structure of the C2 fragment of streptococcal protein G in complex with the Fc domain of human IgG. , 1995, Structure.

[12]  Bin Xu,et al.  A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces , 2012, BMC Systems Biology.

[13]  Wim J. N. Meester,et al.  Structure of the Ubiquitin Hydrolase UCH-L3 Complexed with a Suicide Substrate* , 2005, Journal of Biological Chemistry.

[14]  Richard W. Farndale,et al.  Structural Basis of Collagen Recognition by Integrin α2β1 , 2000, Cell.

[15]  Pinak Chakrabarti,et al.  PRICE (PRotein Interface Conservation and Energetics): a server for the analysis of protein–protein interfaces , 2011, Journal of Structural and Functional Genomics.

[16]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[17]  W. Sundquist,et al.  Crystal Structure of Human Cyclophilin A Bound to the Amino-Terminal Domain of HIV-1 Capsid , 1996, Cell.

[18]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[19]  Lan Huang,et al.  Structural basis for the interaction of Ras with RaIGDS , 1998, Nature Structural Biology.

[20]  M. Michael Gromiha,et al.  PINT: Protein–protein Interactions Thermodynamic Database , 2005, Nucleic Acids Res..

[21]  Sanjana Sudarshan,et al.  Protein-Protein Interface Detection Using the Energy Centrality Relationship (ECR) Characteristic of Proteins , 2014, PloS one.

[22]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[23]  A M Lesk,et al.  Interior and surface of monomeric proteins. , 1987, Journal of molecular biology.

[24]  Keith Brew,et al.  Crystal Structure of the Catalytic Domain of Matrix Metalloproteinase-1 in Complex with the Inhibitory Domain of Tissue Inhibitor of Metalloproteinase-1* , 2006, Journal of Biological Chemistry.

[25]  Jean-Michel Claverie,et al.  Structure and evolution of the Ivy protein family, unexpected lysozyme inhibitors in Gram-negative bacteria , 2007, Proceedings of the National Academy of Sciences.

[26]  M. Natália D. S. Cordeiro,et al.  Solvent Accessible Surface Area-Based Hot-Spot Detection Methods for Protein-Protein and Protein-Nucleic Acid Interfaces , 2015, J. Chem. Inf. Model..

[27]  Jerónimo Bravo,et al.  The leukemia-associated AML1 (Runx1)–CBFβ complex functions as a DNA-induced molecular clamp , 2001, Nature Structural Biology.

[28]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[29]  Julie C. Mitchell,et al.  KFC2: A knowledge‐based hot spot prediction method based on interface solvation, atomic density, and plasticity features , 2011, Proteins.

[30]  Masahiro Kinoshita,et al.  Crucial importance of the water-entropy effect in predicting hot spots in protein-protein complexes. , 2011, Physical chemistry chemical physics : PCCP.

[31]  Shashank Deep,et al.  Crystal structure of the human TβR2 ectodomain–TGF-β3 complex , 2002, Nature Structural Biology.

[32]  J. Deisenhofer,et al.  A structural basis of the interactions between leucine-rich repeats and protein ligands , 1995, Nature.

[33]  K. Acharya,et al.  Molecular recognition of human angiogenin by placental ribonuclease inhibitor—an X‐ray crystallographic study at 2.0 Å resolution , 1997, The EMBO journal.

[34]  J. Kraut,et al.  Crystal structure of a complex between electron transfer partners, cytochrome c peroxidase and cytochrome c. , 1993, Science.

[35]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[36]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[37]  Xiang-Sun Zhang,et al.  Prediction of hot spots in protein interfaces using a random forest model with hybrid features. , 2012, Protein engineering, design & selection : PEDS.

[38]  S. Burley,et al.  Crystal structure of the MazE/MazF complex: molecular bases of antidote-toxin recognition. , 2003, Molecular cell.

[39]  Salam A. Assi,et al.  Presaging Critical Residues in Protein interfaces-Web Server (PCRPi-W): A Web Server to Chart Hot Spots in Protein Interfaces , 2010, PloS one.

[40]  L. Prasad,et al.  The 2.5 A resolution structure of the jel42 Fab fragment/HPr complex. , 1998, Journal of molecular biology.

[41]  Robert Huber,et al.  Structural basis for the anticoagulant activity of the thrombin–thrombomodulin complex , 2000, Nature.

[42]  Shuigeng Zhou,et al.  Boosting Prediction Performance of Protein-Protein Interaction Hot Spots by Using Structural Neighborhood Properties - (Extended Abstract) , 2013, RECOMB.

[43]  A. Fersht,et al.  Protein-protein recognition: crystal structural analysis of a barnase-barstar complex at 2.0-A resolution. , 1994, Biochemistry.

[44]  J. Deisenhofer Crystallographic refinement and atomic models of a human Fc fragment and its complex with fragment B of protein A from Staphylococcus aureus at 2.9- and 2.8-A resolution. , 1981, Biochemistry.

[45]  D. Goldenberg,et al.  Rigidification of a flexible protease inhibitor variant upon binding to trypsin. , 2007, Journal of molecular biology.

[46]  Huan-Xiang Zhou,et al.  Prediction of interface residues in protein–protein complexes by a consensus neural network method: Test against NMR data , 2005, Proteins.

[47]  W G Laver,et al.  The structure of a complex between the NC10 antibody and influenza virus neuraminidase and comparison with the overlapping binding site of the NC41 antibody. , 1994, Structure.

[48]  W. Sundquist,et al.  Ubiquitin recognition by the human TSG101 protein. , 2004, Molecular cell.

[49]  D. Bailey,et al.  The Binding Interface Database (BID): A Compilation of Amino Acid Hot Spots in Protein Interfaces , 2003, Bioinform..

[50]  R. Timpl,et al.  Structural basis for the high‐affinity interaction of nidogen‐1 with immunoglobulin‐like domain 3 of perlecan , 2001, The EMBO journal.

[51]  D. Baker,et al.  The structural and energetic basis for high selectivity in a high-affinity protein-protein interaction , 2010, Proceedings of the National Academy of Sciences.

[52]  Cyrus Chothia,et al.  The accessible surface area and stability of oligomeric proteins , 1987, Nature.

[53]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[54]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[55]  P. Reinemer,et al.  INTERLEUKIN-4 / RECEPTOR ALPHA CHAIN COMPLEX , 2000 .

[56]  T. Bhat,et al.  Bound water molecules and conformational stabilization help mediate an antigen-antibody association. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Alexandre M. J. J. Bonvin,et al.  CPORT: A Consensus Interface Predictor and Its Performance in Prediction-Driven Docking with HADDOCK , 2011, PloS one.

[58]  Pedro Alexandrino Fernandes,et al.  Computational alanine scanning mutagenesis—An improved methodological approach , 2007, J. Comput. Chem..

[59]  Sun-Shin Cha,et al.  Analyses of Mlc–IIBGlc interaction and a plausible molecular mechanism of Mlc inactivation by membrane sequestration , 2008, Proceedings of the National Academy of Sciences.

[60]  Jack F. Kirsch,et al.  Structural and Computational Characterization of the SHV-1 β-Lactamase-β-Lactamase Inhibitor Protein Interface* , 2006, Journal of Biological Chemistry.

[61]  J. Wells,et al.  High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. , 1989, Science.

[62]  Jinyan Li,et al.  Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences , 2013, Proteins.

[63]  Qinghua Zhou,et al.  A site-directed mutagenesis method particularly useful for creating otherwise difficult-to-make mutants and alanine scanning. , 2012, Analytical biochemistry.

[64]  T. Hynes,et al.  Crystal structures of bovine chymotrypsin and trypsin complexed to the inhibitor domain of alzheimer's amyloid β‐protein precursor (APPI) and basic pancreatic trypsin inhibitor (BPTI): Engineering of inhibitors with altered specificities , 1997, Protein science : a publication of the Protein Society.

[65]  R. St Charles,et al.  Structure of extracellular tissue factor complexed with factor VIIa inhibited with a BPTI mutant. , 1999, Journal of molecular biology.

[66]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[67]  D. Shaw,et al.  Structural basis of interaction between urokinase-type plasminogen activator and its receptor. , 2006, Journal of molecular biology.

[68]  Robert Huber,et al.  Catalytic Domain Structures of MT-SP1/Matriptase, a Matrix-degrading Transmembrane Serine Proteinase* , 2002, The Journal of Biological Chemistry.

[69]  Gene Kwan,et al.  Binding, proteolytic, and crystallographic analyses of mutations at the protease-inhibitor interface of the subtilisin BPN'/chymotrypsin inhibitor 2 complex. , 2004, Biochemistry.

[70]  J. Martins,et al.  Solvent‐accessible surface area: How well can be applied to hot‐spot detection? , 2014, Proteins.

[71]  C. Schutt,et al.  The structure of crystalline profilin-beta-actin. , 1993, Nature.

[72]  Seren Soner,et al.  Hot Spots in a Network of Functional Sites , 2013, PloS one.

[73]  Kenji Ogura,et al.  Novel recognition mode between Vav and Grb2 SH3 domains , 2001, The EMBO journal.

[74]  Aida Baharuddin,et al.  A reverse binding motif that contributes to specific protease inhibition by antibodies. , 2012, Journal of molecular biology.

[75]  G. Marius Clore,et al.  Improving the Packing and Accuracy of NMR Structures with a Pseudopotential for the Radius of Gyration , 1999 .

[76]  Hongmin Li,et al.  Crystal structure of a T-cell receptor β-chain complexed with a superantigen , 1996, Nature.

[77]  J. Sodroski,et al.  Structure of an HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing human antibody , 1998, Nature.

[78]  David W. Banner,et al.  The crystal structure of the complex of blood coagulation factor VIIa with soluble tissue factor , 1996, Nature.

[79]  C. Craik,et al.  Structure of an Fab-protease complex reveals a highly specific non-canonical mechanism of inhibition. , 2008, Journal of molecular biology.

[80]  S. P. Walton,et al.  Latest developments in experimental and computational approaches to characterize protein–lipid interactions , 2012, Proteomics.

[81]  Mahua Ghosh,et al.  The Nuclease A-Inhibitor Complex Is Characterized by a Novel Metal Ion Bridge* , 2007, Journal of Biological Chemistry.

[82]  G. Cohen,et al.  Structure of an antibody-antigen complex: crystal structure of the HyHEL-10 Fab-lysozyme complex. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Irina S Moreira The Role of Water Occlusion for the Definition of a Protein Binding Hot-Spot. , 2015, Current topics in medicinal chemistry.

[84]  A. Kotzsch,et al.  Crystal structure analysis reveals how the Chordin family member crossveinless 2 blocks BMP-2 receptor binding. , 2008, Developmental cell.

[85]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[86]  S. Fields,et al.  Protein-protein interactions: methods for detection and analysis , 1995, Microbiological reviews.

[87]  Doheon Lee,et al.  A feature-based approach to modeling protein–protein interaction hot spots , 2009, Nucleic acids research.

[88]  T. Kirsch,et al.  Crystal structure of the BMP-2–BRIA ectodomain complex , 2000, Nature Structural Biology.

[89]  Julie C. Mitchell,et al.  KFC Server: interactive forecasting of protein interaction hot spots , 2008, Nucleic Acids Res..

[90]  Ronald T Raines,et al.  Inhibition of human pancreatic ribonuclease by the human ribonuclease inhibitor protein. , 2005, Journal of molecular biology.

[91]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[92]  Juan Fernández-Recio,et al.  SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models , 2012, Bioinform..