APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility

BackgroundIt is well known that most of the binding free energy of protein interaction is contributed by a few key hot spot residues. These residues are crucial for understanding the function of proteins and studying their interactions. Experimental hot spots detection methods such as alanine scanning mutagenesis are not applicable on a large scale since they are time consuming and expensive. Therefore, reliable and efficient computational methods for identifying hot spots are greatly desired and urgently required.ResultsIn this work, we introduce an efficient approach that uses support vector machine (SVM) to predict hot spot residues in protein interfaces. We systematically investigate a wide variety of 62 features from a combination of protein sequence and structure information. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the F-score method. Based on the selected features, nine individual-feature based predictors are developed to identify hot spots using SVMs. Furthermore, a new ensemble classifier, namely APIS (A combined model based on Protrusion Index and Solvent accessibility), is developed to further improve the prediction accuracy. The results on two benchmark datasets, ASEdb and BID, show that this proposed method yields significantly better prediction accuracy than those previously published in the literature. In addition, we also demonstrate the predictive power of our proposed method by modelling two protein complexes: the calmodulin/myosin light chain kinase complex and the heat shock locus gene products U and V complex, which indicate that our method can identify more hot spots in these two complexes compared with other state-of-the-art methods.ConclusionWe have developed an accurate prediction model for hot spot residues, given the structure of a protein complex. A major contribution of this study is to propose several new features based on the protrusion index of amino acid residues, which has been shown to significantly improve the prediction performance of hot spots. Moreover, we identify a compact and useful feature subset that has an important implication for identifying hot spot residues. Our results indicate that these features are more effective than the conventional evolutionary conservation, pairwise residue potentials and other traditional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues. The data and source code are available on web site http://home.ustc.edu.cn/~jfxia/hotspot.html.

[1]  Nir Ben-Tal,et al.  The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures , 2008, Nucleic Acids Res..

[2]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[3]  Julie C. Mitchell,et al.  KFC Server: interactive forecasting of protein interaction hot spots , 2008, Nucleic Acids Res..

[4]  F C Stevens,et al.  Calmodulin: an introduction. , 1983, Canadian journal of biochemistry and cell biology = Revue canadienne de biochimie et biologie cellulaire.

[5]  W. Delano Unraveling hot spots in binding interfaces: progress and challenges. , 2002, Current opinion in structural biology.

[6]  Ozlem Keskin,et al.  HotSprint: database of computational hot spots in protein interfaces , 2007, Nucleic Acids Res..

[7]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[8]  K. Aihara,et al.  Uncovering signal transduction networks from high-throughput data by integer linear programming , 2008, Nucleic acids research.

[9]  M. Šikić,et al.  PSAIA – Protein Structure and Interaction Analyzer , 2008, BMC Structural Biology.

[10]  D. Bailey,et al.  The Binding Interface Database (BID): A Compilation of Amino Acid Hot Spots in Protein Interfaces , 2003, Bioinform..

[11]  F A Quiocho,et al.  Target enzyme recognition by calmodulin: 2.4 A structure of a calmodulin-peptide complex. , 1992, Science.

[12]  Kyungsook Han,et al.  Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor. , 2010, Protein and peptide letters.

[13]  Andreas Antoniou,et al.  Identification of Hot-Spot Locations in Proteins Using Digital Filters , 2008, IEEE Journal of Selected Topics in Signal Processing.

[14]  Qian Liu,et al.  Propensity vectors of low‐ASA residue pairs in the distinction of protein interactions , 2010, Proteins.

[15]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Oliviero Carugo,et al.  DPX: for the analysis of the protein core , 2003, Bioinform..

[17]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[18]  Doheon Lee,et al.  A feature-based approach to modeling protein–protein interaction hot spots , 2009, Nucleic acids research.

[19]  Mona Singh,et al.  Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[20]  Meena Kishore Sakharkar,et al.  Identification of hot spot residues at protein-protein interface , 2006, Bioinformation.

[21]  Fan Jiang,et al.  Prediction of protein-protein binding site by using core interface residue and support vector machine , 2008, BMC Bioinformatics.

[22]  Geoffrey I. Webb,et al.  Prodepth: Predict Residue Depth by Support Vector Regression Approach from Protein Sequences Only , 2009, PloS one.

[23]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[24]  Holger Gohlke,et al.  Targeting protein-protein interactions with small molecules: challenges and perspectives for computational binding epitope detection and ligand finding. , 2006, Current medicinal chemistry.

[25]  S. Vajda,et al.  Anchor residues in protein-protein interactions. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Christine B. Trame,et al.  Crystal and Solution Structures of an HslUV Protease–Chaperone Complex , 2000, Cell.

[27]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[28]  J M Thornton,et al.  Protein-protein interactions: a review of protein dimer structures. , 1995, Progress in biophysics and molecular biology.

[29]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[30]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[31]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[32]  Zikai Wu,et al.  Identifying responsive functional modules from protein-protein interaction network , 2009, Molecules and cells.

[33]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[35]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[36]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[37]  Jie Liang,et al.  Protein-protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking. , 2004, Journal of molecular biology.

[38]  Oliviero Carugo,et al.  CX, an algorithm that identifies protruding atoms in proteins , 2002, Bioinform..

[39]  Jinyan Li,et al.  ‘Double water exclusion’: a hypothesis refining the O-ring theory for the hot spots at protein interfaces , 2009, Bioinform..

[40]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[41]  P. Chakrabarti,et al.  Conservation and relative importance of residues across protein-protein interfaces , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  K. Aihara,et al.  A discriminative approach for identifying domain–domain interactions from protein–protein interactions , 2010, Proteins.

[43]  R. Nussinov,et al.  Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues. , 2005, Journal of molecular biology.

[44]  Ozlem Keskin,et al.  Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy , 2009, Bioinform..

[45]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[46]  Vasant Honavar,et al.  Analysis of Protein Protein Dimeric Interfaces , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[47]  Massimiliano Pontil,et al.  Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods , 2009, BMC Bioinformatics.

[48]  H. Wolfson,et al.  Protein-protein interactions; coupling of structurally conserved residues and of hot spots across interfaces. Implications for docking. , 2004, Structure.

[49]  Pedro A Fernandes,et al.  Hot spots—A review of the protein–protein interface determinant amino‐acid residues , 2007, Proteins.

[50]  Kristian Vlahovicek,et al.  Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests , 2009, PLoS Comput. Biol..

[51]  Jiangning Song,et al.  Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure , 2007, Bioinform..

[52]  Richard M. Jackson,et al.  Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces , 2006, Bioinform..

[53]  O. Ptitsyn,et al.  Empirical solvent‐mediated potentials hold for both intra‐molecular and inter‐molecular inter‐residue interactions , 1998, Protein science : a publication of the Protein Society.

[54]  Burkhard Rost,et al.  Protein–Protein Interaction Hotspots Carved into Sequences , 2007, PLoS Comput. Biol..

[55]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[56]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[57]  J. Wells,et al.  Systematic mutational analyses of protein-protein interfaces. , 1991, Methods in enzymology.