Exploiting sequence and structure homologs to identify protein–protein binding sites

A rapid increase in the number of experimentally derived three‐dimensional structures provides an opportunity to better understand and subsequently predict protein–protein interactions. In this study, structurally conserved residues were derived from multiple structure alignments of the individual components of known complexes and the assigned conservation score was weighted based on the crystallographic B factor to account for the structural flexibility that will result in a poor alignment. Sequence profile and accessible surface area information was then combined with the conservation score to predict protein–protein binding sites using a Support Vector Machine (SVM). The incorporation of the conservation score significantly improved the performance of the SVM. About 52% of the binding sites were precisely predicted (greater than 70% of the residues in the site were identified); 77% of the binding sites were correctly predicted (greater than 50% of the residues in the site were identified), and 21% of the binding sites were partially covered by the predicted residues (some residues were identified). The results support the hypothesis that in many cases protein interfaces require some residues to provide rigidity to minimize the entropic cost upon complex formation. Proteins 2006. © 2005 Wiley‐Liss, Inc.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  J. Wells,et al.  Systematic mutational analyses of protein-protein interfaces. , 1991, Methods in enzymology.

[3]  D. Rees,et al.  X-ray crystal structure of the nitrogenase molybdenum-iron protein from Clostridium pasteurianum at 3.0-A resolution. , 1993, Biochemistry.

[4]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[5]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[6]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[7]  J. Drenth Principles of protein x-ray crystallography , 1994 .

[8]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[9]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[10]  J. Chroboczek,et al.  Antigenic sites on the receptor-binding domain of human adenovirus type 2 fiber. , 1995, Virology.

[11]  M. Swindells,et al.  Protein clefts in molecular recognition and function. , 1996, Protein science : a publication of the Protein Society.

[12]  J. Hurst,et al.  Differential effects of FGFR2 mutations on syndactyly and cleft palate in Apert syndrome. , 1996, American journal of human genetics.

[13]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[14]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[15]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[16]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[17]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[18]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[19]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[20]  M. Bewley,et al.  Structural analysis of the mechanism of adenovirus binding to its human cellular receptor, CAR. , 1999, Science.

[21]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[22]  A. Sali,et al.  Structural genomics: beyond the Human Genome Project , 1999, Nature Genetics.

[23]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[24]  M. Gonzalès,et al.  Clinical variability in patients with Apert's syndrome. , 1999, Journal of neurosurgery.

[25]  A. Thomas,et al.  A fast method to predict protein interaction sites from sequences. , 2000, Journal of molecular biology.

[26]  L. Shapiro,et al.  Finding function through structural genomics. , 2000, Current opinion in biotechnology.

[27]  R. Nussinov,et al.  Conservation of polar residues as hot spots at protein interfaces , 2000, Proteins.

[28]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  U. Heinemann,et al.  Adrenodoxin: Structure, stability, and electron transfer properties , 2000, Proteins.

[31]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[32]  Ruth Nussinov,et al.  MUSTA - A General, Efficient, Automated Method for Multiple Structure Alignment and Detection of Common Motifs: Application to Proteins , 2001, J. Comput. Biol..

[33]  Philip E. Bourne,et al.  A New Algorithm for the Alignment of Multiple Protein Structures Using Monte Carlo Optimization , 2000, Pacific Symposium on Biocomputing.

[34]  M. Mohammadi,et al.  Structural basis for fibroblast growth factor receptor 2 activation in Apert syndrome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[35]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[36]  R Nussinov,et al.  Automated multiple structure alignment and detection of a common substructural motif , 2001, Proteins.

[37]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.

[38]  Gleb Bourenkov,et al.  Adrenodoxin Reductase-Adrenodoxin Complex Structure Suggests Electron Transfer Path in Steroid Biosynthesis* , 2001, The Journal of Biological Chemistry.

[39]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[40]  M. Sternberg,et al.  Prediction of protein-protein interactions by docking methods. , 2002, Current opinion in structural biology.

[41]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[42]  O. Lichtarge,et al.  Evolutionary predictions of binding surfaces and interactions. , 2002, Current opinion in structural biology.

[43]  W. Delano Unraveling hot spots in binding interfaces: progress and challenges. , 2002, Current opinion in structural biology.

[44]  A. Valencia,et al.  In silico two‐hybrid system for the selection of physically interacting protein pairs , 2002, Proteins.

[45]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[46]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[47]  Ruth Nussinov,et al.  Principles of docking: An overview of search algorithms and a guide to scoring functions , 2002, Proteins.

[48]  Tal Pupko,et al.  A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: Application to the evolution of five gene families , 2002, Bioinform..

[49]  Hui Lu,et al.  MULTIPROSPECTOR: An algorithm for the prediction of protein–protein interactions by multimeric threading , 2002, Proteins.

[50]  Patrick Aloy,et al.  Interrogating protein interaction networks through structural biology , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Barry Honig,et al.  On the role of electrostatic interactions in the design of protein-protein interfaces. , 2002, Journal of molecular biology.

[52]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[53]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[54]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[55]  R. Russell,et al.  The relationship between sequence and interaction divergence in proteins. , 2003, Journal of molecular biology.

[56]  P. Radivojac,et al.  Improved amino acid flexibility parameters , 2003, Protein science : a publication of the Protein Society.

[57]  Tal Pupko,et al.  Structural Genomics , 2005 .

[58]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Xiu-fen Lei,et al.  Measurement of DNA mismatch repair activity in live cells. , 2004, Nucleic acids research.

[60]  Vasant Honavar,et al.  A two-stage classifier for identification of protein-protein interface residues , 2004, ISMB/ECCB.

[61]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[62]  Philip E. Bourne,et al.  CE-MC: a multiple protein structure alignment server , 2004, Nucleic Acids Res..

[63]  M F Lawrence,et al.  Impedance-based detection of DNA sequences using a silicon transducer with PNA as the probe layer. , 2004, Nucleic acids research.

[64]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[65]  Thomas L. Madden,et al.  BLAST: at the core of a powerful and diverse set of sequence analysis tools , 2004, Nucleic Acids Res..

[66]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[67]  Shoshana J Wodak,et al.  Prediction of protein-protein interactions: the CAPRI experiment, its evaluation and implications. , 2004, Current opinion in structural biology.

[68]  David A. Lee,et al.  Progress towards mapping the universe of protein folds , 2004, Genome Biology.