Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time‐consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K‐nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state‐of‐the‐art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. Proteins 2013; 81:1351–1362 © 2013 Wiley Periodicals, Inc.

[1]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[2]  Jinyan Li,et al.  Detection of Outlier Residues for Improving Interface Prediction in Protein Hetero-complexes , 2022 .

[3]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[4]  Jinyan Li,et al.  Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information , 2010, BMC Bioinformatics.

[5]  Ozlem Keskin,et al.  HotPoint: hot spot prediction server for protein interfaces , 2010, Nucleic Acids Res..

[6]  M. Michael Gromiha,et al.  PINT: Protein–protein Interactions Thermodynamic Database , 2005, Nucleic Acids Res..

[7]  T Pawson,et al.  Multiple modes of peptide recognition by the PTB domain of the cell fate determinant Numb , 2000, The EMBO journal.

[8]  A. del Sol,et al.  Small‐world network approach to identify key residues in protein–protein interaction , 2004, Proteins.

[9]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[10]  Juan Fernández-Recio,et al.  SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models , 2012, Bioinform..

[11]  Xiang-Sun Zhang,et al.  Prediction of hot spots in protein interfaces using a random forest model with hybrid features. , 2012, Protein engineering, design & selection : PEDS.

[12]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[13]  Burkhard Rost,et al.  Protein–Protein Interaction Hotspots Carved into Sequences , 2007, PLoS Comput. Biol..

[14]  Pinak Chakrabarti,et al.  Interresidue contacts in proteins and protein-protein interfaces and their use in characterizing the homodimeric interface. , 2005, Journal of proteome research.

[15]  W. Delano Unraveling hot spots in binding interfaces: progress and challenges. , 2002, Current opinion in structural biology.

[16]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[17]  Jinyan Li,et al.  Protein binding hot spots and the residue-residue pairing preference: a water exclusion perspective , 2010, BMC Bioinformatics.

[18]  Enrico A. Stura,et al.  Functional Mimicry of a Protein Hormone by a Peptide Agonist: The EPO Receptor Complex at 2.8 Å , 1996, Science.

[19]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Ozlem Keskin,et al.  HotSprint: database of computational hot spots in protein interfaces , 2007, Nucleic Acids Res..

[21]  N. Kannan,et al.  Analysis of homodimeric protein interfaces by graph-spectral methods. , 2002, Protein engineering.

[22]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[23]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[24]  Doheon Lee,et al.  A feature-based approach to modeling protein–protein interaction hot spots , 2009, Nucleic acids research.

[25]  D. Bailey,et al.  The Binding Interface Database (BID): A Compilation of Amino Acid Hot Spots in Protein Interfaces , 2003, Bioinform..

[26]  Susan Jones,et al.  SHARP2: protein-protein interaction predictions using patch analysis , 2006, Bioinform..

[27]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[28]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[29]  Solène Grosdidier,et al.  Identification of hot-spot residues in protein-protein interactions by computational docking , 2008, BMC Bioinformatics.

[30]  B. Alder,et al.  Studies in Molecular Dynamics. I. General Method , 1959 .

[31]  Luhua Lai,et al.  Structure-based method for analyzing protein–protein interfaces , 2004, Journal of molecular modeling.

[32]  P. Chakrabarti,et al.  Conservation and relative importance of residues across protein-protein interfaces , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Holger Gohlke,et al.  Targeting protein-protein interactions with small molecules: challenges and perspectives for computational binding epitope detection and ligand finding. , 2006, Current medicinal chemistry.

[34]  S. Vajda,et al.  Anchor residues in protein-protein interactions. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Ozlem Keskin,et al.  Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy , 2009, Bioinform..

[36]  Peter A. Kollman,et al.  Computational alanine scanning of the 1:1 human growth hormone–receptor complex , 2002, J. Comput. Chem..

[37]  David E. Kim,et al.  Computational Alanine Scanning of Protein-Protein Interfaces , 2004, Science's STKE.

[38]  Xin Gao,et al.  Towards Automating Protein Structure Determination from NMR Data , 2009 .

[39]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[40]  A. Thomas,et al.  A fast method to predict protein interaction sites from sequences. , 2000, Journal of molecular biology.

[41]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[42]  R Nussinov,et al.  Hydrophobic folding units at protein‐protein interfaces: Implications to protein folding and to protein‐protein association , 1997, Protein science : a publication of the Protein Society.

[43]  Massimiliano Pontil,et al.  Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods , 2009, BMC Bioinformatics.

[44]  D. W. Noid Studies in Molecular Dynamics , 1976 .

[45]  David R. Westhead,et al.  Improved prediction of protein-protein binding sites using a support vector machines approach. , 2005, Bioinformatics.

[46]  Jinbo Xu,et al.  Improving consensus contact prediction via server correlation reduction , 2009, BMC Structural Biology.

[47]  R. Nussinov,et al.  Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues. , 2005, Journal of molecular biology.

[48]  N. Ben-Tal,et al.  Residue frequencies and pairing preferences at protein–protein interfaces , 2001, Proteins.

[49]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[50]  François Stricher,et al.  The FoldX web server: an online force field , 2005, Nucleic Acids Res..

[51]  Pedro A Fernandes,et al.  Hot spots—A review of the protein–protein interface determinant amino‐acid residues , 2007, Proteins.