Statistical analysis and prediction of protein–protein interfaces

Predicting protein–protein interfaces from a three‐dimensional structure is a key task of computational structural proteomics. In contrast to geometrically distinct small molecule binding sites, protein–protein interface are notoriously difficult to predict. We generated a large nonredundant data set of 1494 true protein–protein interfaces using biological symmetry annotation where necessary. The data set was carefully analyzed and a Support Vector Machine was trained on a combination of a new robust evolutionary conservation signal with the local surface properties to predict protein–protein interfaces. Fivefold cross validation verifies the high sensitivity and selectivity of the model. As much as 97% of the predicted patches had an overlap with the true interface patch while only 22% of the surface residues were included in an average predicted patch. The model allowed the identification of potential new interfaces and the correction of mislabeled oligomeric states. Proteins 2005. © 2005 Wiley‐Liss, Inc.

[1]  J. Thornton,et al.  Structural characterisation and functional significance of transient protein-protein interactions. , 2003, Journal of molecular biology.

[2]  Allan Matte,et al.  The structure of the RlmB 23S rRNA methyltransferase reveals a new methyltransferase fold with a unique knot. , 2002, Structure.

[3]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[4]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.

[5]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[6]  J. Thornton,et al.  Protein–protein interfaces: Analysis of amino acid conservation in homodimers , 2001, Proteins.

[7]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[8]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[9]  S. Gupta,et al.  Statistical decision theory and related topics IV , 1988 .

[10]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[11]  Tadhg P Begley,et al.  Crystal structure of 4-amino-5-hydroxymethyl-2-methylpyrimidine phosphate kinase from Salmonella typhimurium at 2.3 A resolution. , 2002, Structure.

[12]  K. Christopher Garcia,et al.  Structure of the immunodominant surface antigen from the Toxoplasma gondii SRS superfamily , 2002, Nature Structural Biology.

[13]  N. Grishin,et al.  The subunit interfaces of oligomeric enzymes are conserved to a similar extent to the overall protein sequences , 1994, Protein science : a publication of the Protein Society.

[14]  Martin A Walsh,et al.  Crystal structure of MboIIA methyltransferase. , 2003, Nucleic acids research.

[15]  A. McCoy,et al.  Electrostatic complementarity at protein/protein interfaces. , 1997, Journal of molecular biology.

[16]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[17]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[18]  J. Thornton,et al.  Discriminating between homodimeric and monomeric proteins in the crystalline state , 2000, Proteins.

[19]  F. Gil-Ortiz,et al.  Structure of acetylglutamate kinase, a key enzyme for arginine biosynthesis and a prototype for the amino acid kinase enzyme family, during catalysis. , 2002, Structure.

[20]  Tahir H. Tahirov,et al.  Corrigendum: Crystal structure of Abrin-a at 2.14 Å (Journal of Molecular Biology (1995) 250 (354-367)) , 1995 .

[21]  Konstantin Korotkov,et al.  The 1.6-Å crystal structure of the class of chaperones represented by Escherichia coli Hsp31 reveals a putative catalytic triad , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[23]  M. Page,et al.  Structure and function of the dihydropteroate synthase from Staphylococcus aureus. , 1997, Journal of molecular biology.

[24]  R. M. Burnett,et al.  Distribution and complementarity of hydropathy in mutisunit proteins , 1991, Proteins.

[25]  H. Wolfson,et al.  Shape complementarity at protein–protein interfaces , 1994, Biopolymers.

[26]  P. Debye The Crystalline State , 1934, Nature.

[27]  R. Wolfenden,et al.  Cytidine deaminase complexed to 3-deazacytidine: a "valence buffer" in zinc enzyme catalysis. , 1996, Biochemistry.

[28]  G Klebe,et al.  The Crystal Structure of 3α-Hydroxysteroid Dehydrogenase/Carbonyl Reductase from Comamonas testosteroni Shows a Novel Oligomerization Pattern within the Short Chain Dehydrogenase/Reductase Family* , 2000, The Journal of Biological Chemistry.

[29]  Ruben Abagyan,et al.  REVCOM: a robust Bayesian method for evolutionary rate estimation , 2005, Bioinform..

[30]  Alfred Wittinghofer,et al.  Structural Basis for Guanine Nucleotide Exchange on Ran by the Regulator of Chromosome Condensation (RCC1) , 2001, Cell.

[31]  M. Bulmer Use of the Method of Generalized Least Squares in Reconstructing Phylogenies from Sequence Data , 1991 .

[32]  J. Janin,et al.  A dissection of specific and non-specific protein-protein interfaces. , 2004, Journal of molecular biology.

[33]  S. Vajda,et al.  Anchor residues in protein-protein interactions. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  David Eisenberg,et al.  Structure and function of an archaeal homolog of survival protein E (SurEalpha): an acid phosphatase with purine nucleotide specificity. , 2003, Journal of molecular biology.

[35]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[36]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[37]  D. Eisenberg,et al.  Atomic solvation parameters applied to molecular dynamics of proteins in solution , 1992, Protein science : a publication of the Protein Society.

[38]  R. Nussinov,et al.  Hydrogen bonds and salt bridges across protein-protein interfaces. , 1997, Protein engineering.

[39]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[40]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[41]  Daniel R. Caffrey,et al.  Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? , 2004, Protein science : a publication of the Protein Society.

[42]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[43]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[44]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[45]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[46]  R. Abagyan,et al.  Optimal docking area: A new method for predicting protein–protein interaction sites , 2004, Proteins.

[47]  T. Tahirov,et al.  Crystal structure of abrin-a at 2.14 A. , 1995, Journal of molecular biology.

[48]  Gordon Leonard,et al.  Variation on a theme of SDR. dTDP-6-deoxy-L- lyxo-4-hexulose reductase (RmlD) shows a new Mg2+-dependent dimerization mode. , 2002, Structure.

[49]  Neil Q. McDonald,et al.  Crystal structure of a γ‐herpesvirus cyclin–cdk complex , 2000 .

[50]  Alex Bateman,et al.  QuickTree: building huge Neighbour-Joining trees of protein sequences , 2002, Bioinform..

[51]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[52]  Gaetano T Montelione,et al.  The 2.3-Å Crystal Structure of the Shikimate 5-Dehydrogenase Orthologue YdiB from Escherichia coli Suggests a Novel Catalytic Environment for an NAD-dependent Dehydrogenase* , 2003, Journal of Biological Chemistry.

[53]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[54]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[55]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[56]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[57]  R. Norel,et al.  Electrostatic aspects of protein-protein interactions. , 2000, Current opinion in structural biology.

[58]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[59]  J. Springer,et al.  Structure of inositol monophosphatase, the putative target of lithium therapy. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[60]  K S Wilson,et al.  Complex between the subtilisin from a mesophilic bacterium and the leech inhibitor eglin-C. , 1991, Acta crystallographica. Section B, Structural science.

[61]  Tal Pupko,et al.  A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: Application to the evolution of five gene families , 2002, Bioinform..

[62]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[64]  S Cusack,et al.  The 2.9 A crystal structure of T. thermophilus seryl-tRNA synthetase complexed with tRNA(Ser). , 1994, Science.

[65]  F. Studier,et al.  Crystal structure of a putative CN hydrolase from yeast , 2003, Proteins.

[66]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.