Prediction of protein-protein binding site by using core interface residue and support vector machine

BackgroundThe prediction of protein-protein binding site can provide structural annotation to the protein interaction data from proteomics studies. This is very important for the biological application of the protein interaction data that is increasing rapidly. Moreover, methods for predicting protein interaction sites can also provide crucial information for improving the speed and accuracy of protein docking methods.ResultsIn this work, we describe a binding site prediction method by designing a new residue neighbour profile and by selecting only the core-interface residues for SVM training. The residue neighbour profile includes both the sequential and the spatial neighbour residues of an interface residue, which is a more complete description of the physical and chemical characteristics surrounding the interface residue. The concept of core interface is applied in selecting the interface residues for training the SVM models, which is shown to result in better discrimination between the core interface and other residues.The best SVM model trained was tested on a test set of 50 randomly selected proteins. The sensitivity, specificity, and MCC for the prediction of the core interface residues were 60.6%, 53.4%, and 0.243, respectively. Our prediction results on this test set were compared with other three binding site prediction methods and found to perform better. Furthermore, our method was tested on the 101 unbound proteins from the protein-protein interaction benchmark v2.0. The sensitivity, specificity, and MCC of this test were 57.5%, 32.5%, and 0.168, respectively.ConclusionBy improving both the descriptions of the interface residues and their surrounding environment and the training strategy, better SVM models were obtained and shown to outperform previous methods. Our tests on the unbound protein structures suggest further improvement is possible.

[1]  S. Wodak,et al.  Docking and scoring protein complexes: CAPRI 3rd Edition , 2007, Proteins.

[2]  Nan Li,et al.  SOFTDOCK application to protein–protein interaction benchmark and CAPRI , 2007, Proteins.

[3]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[4]  C. DeLisi,et al.  Determination of atomic desolvation energies from the structures of crystallized proteins. , 1997, Journal of molecular biology.

[5]  Z. Weng,et al.  Protein–protein docking benchmark 2.0: An update , 2005, Proteins.

[6]  Eyke Hüllermeier,et al.  Physicochemical descriptors to discriminate protein–protein interactions in permanent and transient complexes selected by means of machine learning algorithms , 2006, Proteins.

[7]  A. Bulpitt,et al.  Insights into protein-protein interfaces using a Bayesian network prediction method. , 2006, Journal of molecular biology.

[8]  P. Bourne,et al.  Exploiting sequence and structure homologs to identify protein–protein binding sites , 2005, Proteins.

[9]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[10]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[11]  M. Sternberg,et al.  Prediction of protein-protein interactions by docking methods. , 2002, Current opinion in structural biology.

[12]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[13]  O. Lichtarge,et al.  Evolutionary predictions of binding surfaces and interactions. , 2002, Current opinion in structural biology.

[14]  D. Eisenberg,et al.  Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. , 1983, Journal of molecular biology.

[15]  Paul W. Fitzjohn,et al.  Incorporation of flexibility into rigid‐body docking: Applications in rounds 3–5 of CAPRI , 2005, Proteins.

[16]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[17]  David R. Westhead,et al.  Improved prediction of protein-protein binding sites using a support vector machines approach. , 2005, Bioinformatics.

[18]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[19]  S. Fields,et al.  Protein analysis on a proteomic scale , 2003, Nature.

[20]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[21]  A. Bonvin,et al.  WHISCY: What information does surface conservation yield? Application to data‐driven docking , 2006, Proteins.

[22]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[23]  O. Lichtarge,et al.  Structural clusters of evolutionary trace residues are statistically significant and common in proteins. , 2002, Journal of molecular biology.

[24]  Harianto Tjong,et al.  PI2PE: protein interface/interior prediction engine , 2007, Nucleic Acids Res..

[25]  H. Wolfson,et al.  Studies of protein‐protein interfaces: A statistical analysis of the hydrophobic effect , 1997, Protein science : a publication of the Protein Society.

[26]  J. Janin,et al.  Dissecting protein–protein recognition sites , 2002, Proteins.

[27]  Zhiping Weng,et al.  A protein–protein docking benchmark , 2003, Proteins.

[28]  William H. Press,et al.  Numerical recipes in C , 2002 .

[29]  Song Liu,et al.  Protein binding site prediction using an empirical scoring function , 2006, Nucleic acids research.

[30]  Olivier Lichtarge,et al.  Rank information: A structure‐independent measure of evolutionary trace quality that improves identification of protein functional sites , 2006, Proteins.

[31]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[32]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[33]  Huan-Xiang Zhou,et al.  Prediction of interface residues in protein–protein complexes by a consensus neural network method: Test against NMR data , 2005, Proteins.

[34]  J M Thornton,et al.  Protein-protein interactions: a review of protein dimer structures. , 1995, Progress in biophysics and molecular biology.

[35]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Ruben Abagyan,et al.  PIER: Protein interface recognition for structural proteomics , 2007, Proteins.

[37]  Alexandre M J J Bonvin,et al.  Data‐driven docking for the study of biomolecular complexes , 2005, The FEBS journal.

[38]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[39]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[40]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[41]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[42]  R. Nussinov,et al.  Conservation of polar residues as hot spots at protein interfaces , 2000, Proteins.

[43]  L. Kavraki,et al.  An accurate, sensitive, and scalable method to identify functional sites in protein structures. , 2003, Journal of molecular biology.

[44]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[45]  M. Sternberg,et al.  Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. , 2001, Journal of molecular biology.

[46]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[47]  Vasant Honavar,et al.  A two-stage classifier for identification of protein-protein interface residues , 2004, ISMB/ECCB.

[48]  Huan-Xiang Zhou,et al.  A holistic approach to protein docking , 2007, Proteins.

[49]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.