A feature-based approach to modeling protein–protein interaction hot spots

Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to π–related interactions, especially π · · · π interactions.

[1]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[2]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[3]  C. Chothia,et al.  Principles of protein–protein recognition , 1975, Nature.

[4]  E. Baker,et al.  Hydrogen bonding in globular proteins. , 1984, Progress in biophysics and molecular biology.

[5]  P. Argos An investigation of protein subunit and domain interfaces. , 1988, Protein engineering.

[6]  C. Chothia,et al.  The structure of protein-protein recognition sites. , 1990, The Journal of biological chemistry.

[7]  R. M. Burnett,et al.  Distribution and complementarity of hydropathy in mutisunit proteins , 1991, Proteins.

[8]  J. Wells,et al.  Systematic mutational analyses of protein-protein interfaces. , 1991, Methods in enzymology.

[9]  Collaborative Computational,et al.  The CCP4 suite: programs for protein crystallography. , 1994, Acta crystallographica. Section D, Biological crystallography.

[10]  H. Wolfson,et al.  Shape complementarity at protein–protein interfaces , 1994, Biopolymers.

[11]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[12]  J M Thornton,et al.  Protein-protein interactions: a review of protein dimer structures. , 1995, Progress in biophysics and molecular biology.

[13]  R. Altman,et al.  Characterizing the microenvironment surrounding protein sites , 1995, Protein science : a publication of the Protein Society.

[14]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[15]  A. McCoy,et al.  Electrostatic complementarity at protein/protein interfaces. , 1997, Journal of molecular biology.

[16]  O. Ptitsyn,et al.  Empirical solvent‐mediated potentials hold for both intra‐molecular and inter‐molecular inter‐residue interactions , 1998, Protein science : a publication of the Protein Society.

[17]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[18]  G. McGaughey,et al.  pi-Stacking interactions. Alive and well in proteins. , 1998, The Journal of biological chemistry.

[19]  T. Clackson,et al.  Structural and functional analysis of the 1:1 growth hormone:receptor complex reveals the molecular basis for receptor affinity. , 1998, Journal of molecular biology.

[20]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[21]  D. A. Dougherty,et al.  Cation-π interactions in structural biology , 1999 .

[22]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[23]  P. Kollman,et al.  Computational Alanine Scanning To Probe Protein−Protein Interactions: A Novel Approach To Evaluate Binding Free Energies , 1999 .

[24]  D. A. Dougherty,et al.  Cation-pi interactions in structural biology. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  W. Delano,et al.  Convergent solutions to binding at a protein-protein interface. , 2000, Science.

[26]  R. Nussinov,et al.  Conservation of polar residues as hot spots at protein interfaces , 2000, Proteins.

[27]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[28]  A. Elcock,et al.  Identification of protein oligomerization states by analysis of interface conservation , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  T. Steiner,et al.  Hydrogen bonds with pi-acceptors in proteins: frequencies and role in stabilizing local 3D structures. , 2001, Journal of molecular biology.

[30]  N. Ben-Tal,et al.  Residue frequencies and pairing preferences at protein–protein interfaces , 2001, Proteins.

[31]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[32]  C. R. Watts,et al.  Significance of aromatic‐backbone amide interactions in protein structure , 2001, Proteins.

[33]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[34]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[35]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[36]  L. Lai,et al.  CH···O Hydrogen Bonds at Protein-Protein Interfaces* 210 , 2002, The Journal of Biological Chemistry.

[37]  W. Delano Unraveling hot spots in binding interfaces: progress and challenges. , 2002, Current opinion in structural biology.

[38]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[39]  D. Eisenberg,et al.  Computational methods of analysis of protein-protein interactions. , 2003, Current opinion in structural biology.

[40]  D. Bailey,et al.  The Binding Interface Database (BID): A Compilation of Amino Acid Hot Spots in Protein Interfaces , 2003, Bioinform..

[41]  J. Thornton,et al.  Structural characterisation and functional significance of transient protein-protein interactions. , 2003, Journal of molecular biology.

[42]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Jie Liang,et al.  Protein-protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking. , 2004, Journal of molecular biology.

[44]  H. Wolfson,et al.  Protein-protein interactions; coupling of structurally conserved residues and of hot spots across interfaces. Implications for docking. , 2004, Structure.

[45]  David E. Kim,et al.  Computational Alanine Scanning of Protein-Protein Interfaces , 2004, Science's STKE.

[46]  Patrick Aloy,et al.  Ten thousand interactions for the molecular biologist , 2004, Nature Biotechnology.

[47]  S. Vajda,et al.  Anchor residues in protein-protein interactions. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Daniel R. Caffrey,et al.  Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? , 2004, Protein science : a publication of the Protein Society.

[49]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[50]  P. Chakrabarti,et al.  Conservation and relative importance of residues across protein-protein interfaces , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[51]  R. Nussinov,et al.  How similar are protein folding and protein binding nuclei? Examination of vibrational motions of energy hot spots and conserved residues. , 2005, Biophysical journal.

[52]  R. Nussinov,et al.  Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues. , 2005, Journal of molecular biology.

[53]  Frances M. G. Pearl,et al.  The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis , 2004, Nucleic Acids Res..

[54]  Z. Weng,et al.  Structure, function, and evolution of transient and obligate protein-protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[55]  William Stafford Noble,et al.  Support vector machine , 2013 .

[56]  Doheon Lee,et al.  Specificity of molecular interactions in transient protein–protein interaction interfaces , 2006, Proteins.

[57]  R. Nussinov,et al.  Residue centrality, functionally important residues, and active site shape: Analysis of enzyme and non‐enzyme families , 2006, Protein science : a publication of the Protein Society.

[58]  Pedro A Fernandes,et al.  Hot spots—A review of the protein–protein interface determinant amino‐acid residues , 2007, Proteins.

[59]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[60]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[61]  Nor Hayati Othman,et al.  A review of feature selection techniques via gene expression profiles , 2008, 2008 International Symposium on Information Technology.

[62]  Julie C. Mitchell,et al.  KFC Server: interactive forecasting of protein interaction hot spots , 2008, Nucleic Acids Res..

[63]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.