Improved prediction of protein-protein binding sites using a support vector machines approach.

MOTIVATION Structural genomics projects are beginning to produce protein structures with unknown function, therefore, accurate, automated predictors of protein function are required if all these structures are to be properly annotated in reasonable time. Identifying the interface between two interacting proteins provides important clues to the function of a protein and can reduce the search space required by docking algorithms to predict the structures of complexes. RESULTS We have combined a support vector machine (SVM) approach with surface patch analysis to predict protein-protein binding sites. Using a leave-one-out cross-validation procedure, we were able to successfully predict the location of the binding site on 76% of our dataset made up of proteins with both transient and obligate interfaces. With heterogeneous cross-validation, where we trained the SVM on transient complexes to predict on obligate complexes (and vice versa), we still achieved comparable success rates to the leave-one-out cross-validation suggesting that sufficient properties are shared between transient and obligate interfaces. AVAILABILITY A web application based on the method can be found at http://www.bioinformatics.leeds.ac.uk/ppi_pred. The dataset of 180 proteins used in this study is also available via the same web site. CONTACT westhead@bmb.leeds.ac.uk SUPPLEMENTARY INFORMATION http://www.bioinformatics.leeds.ac.uk/ppi-pred/supp-material.

[1]  Matthias Keil,et al.  Pattern recognition strategies for molecular surfaces: III. Binding site prediction with a neural network , 2004, J. Comput. Chem..

[2]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[3]  H. Wolfson,et al.  A new, structurally nonredundant, diverse data set of protein–protein interfaces and its implications , 2004, Protein science : a publication of the Protein Society.

[4]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[5]  Daniel R. Caffrey,et al.  Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? , 2004, Protein science : a publication of the Protein Society.

[6]  Janet M Thornton,et al.  Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. , 2003, Nucleic acids research.

[7]  Quan Pan,et al.  Classification of protein quaternary structure with support vector machine , 2003, Bioinform..

[8]  David R. Westhead,et al.  A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function , 2003, Bioinform..

[9]  Yingdong Zhao,et al.  Application of support vector machines for T-cell epitopes prediction , 2003, Bioinform..

[10]  Jennifer A. Siepen,et al.  β Edge strands in protein structure prediction and aggregation , 2003, Protein science : a publication of the Protein Society.

[11]  David R Westhead,et al.  Asymmetric mutation rates at enzyme–inhibitor interfaces: Implications for the protein–protein docking problem , 2003, Protein science : a publication of the Protein Society.

[12]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[13]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[14]  J. Thornton,et al.  Diversity of protein–protein interactions , 2003, The EMBO journal.

[15]  J. Thornton,et al.  Structural characterisation and functional significance of transient protein-protein interactions. , 2003, Journal of molecular biology.

[16]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[17]  Kevin Burrage,et al.  Prediction of protein solvent accessibility using support vector machines , 2002, Proteins.

[18]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[19]  Ruth Nussinov,et al.  Principles of docking: An overview of search algorithms and a guide to scoring functions , 2002, Proteins.

[20]  Jaques Reifman,et al.  Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions , 2002, Bioinform..

[21]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[22]  Emil Alexov,et al.  Rapid grid‐based construction of the molecular surface and the use of induced surface charge to calculate reaction field energies: Applications to the molecular systems and geometric objects , 2002, J. Comput. Chem..

[23]  Barry Honig,et al.  Extending the Applicability of the Nonlinear Poisson−Boltzmann Equation: Multiple Dielectric Constants and Multivalent Ions† , 2001 .

[24]  N. Ben-Tal,et al.  Residue frequencies and pairing preferences at protein–protein interfaces , 2001, Proteins.

[25]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[26]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[27]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[28]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[29]  L. Tabernero,et al.  Substrate-induced closure of the flap domain in the ternary complex structures provides insights into the mechanism of catalysis by 3-hydroxy-3-methylglutaryl-CoA reductase. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Cecilia Holm,et al.  Crystal structure of brefeldin A esterase, a bacterial homolog of the mammalian hormone-sensitive lipase , 1999, Nature Structural Biology.

[31]  Morgan Huse,et al.  Crystal Structure of the Cytoplasmic Domain of the Type I TGF β Receptor in Complex with FKBP12 , 1999, Cell.

[32]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[33]  T. Poulos,et al.  Crystal Structure of Constitutive Endothelial Nitric Oxide Synthase A Paradigm for Pterin Function Involving a Novel Metal Center , 1998, Cell.

[34]  J. Hurley,et al.  Structure of Type IIβ Phosphatidylinositol Phosphate Kinase A Protein Kinase Fold Flattened for Interfacial Phosphorylation , 1998, Cell.

[35]  Robert Preissner,et al.  Dictionary of Interfaces in Proteins (DIP). Data Bank of complementary molecular surface patches , 1998, German Conference on Bioinformatics.

[36]  A J Olson,et al.  Morphology of protein-protein interfaces. , 1998, Structure.

[37]  Katrin Rittinger,et al.  Structure at 1.65 Å of RhoA and its GTPase-activating protein in complex with a transition-state analogue , 1997, Nature.

[38]  Neil F. W. Saunders,et al.  Haem-ligand switching during catalysis in crystals of a nitrogen-cycle enzyme , 1997, Nature.

[39]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[40]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[41]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[42]  H. Wolfson,et al.  Studies of protein‐protein interfaces: A statistical analysis of the hydrophobic effect , 1997, Protein science : a publication of the Protein Society.

[43]  H. Wolfson,et al.  A dataset of protein-protein interfaces generated with a sequence-order-independent comparison technique. , 1996, Journal of molecular biology.

[44]  P. Caron,et al.  X-ray structure of calcineurin inhibited by the immunophilin-immunosuppressant FKBP12-FK506 complex , 1995, Cell.

[45]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[46]  D. Covell,et al.  A role for surface hydrophobicity in protein‐protein recognition , 1994, Protein science : a publication of the Protein Society.

[47]  Geoffrey J. Barton,et al.  Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation , 1993, Comput. Appl. Biosci..

[48]  A. Olson,et al.  Shape analysis of molecular surfaces , 1993, Biopolymers.

[49]  Jan J. Koenderink,et al.  Solid shape , 1990 .

[50]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[51]  U. Singh,et al.  A NEW FORCE FIELD FOR MOLECULAR MECHANICAL SIMULATION OF NUCLEIC ACIDS AND PROTEINS , 1984 .

[52]  M. L. Connolly Analytical molecular surface calculation , 1983 .

[53]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[54]  C. Chothia,et al.  Principles of protein–protein recognition , 1975, Nature.

[55]  Vasant Honavar,et al.  A two-stage classifier for identification of protein-protein interface residues , 2004, ISMB/ECCB.

[56]  Vasant Honavar,et al.  Identification of Surface Residues Involved in Protein-Protein Interaction — A Support Vector Machine Approach , 2003 .

[57]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[58]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[59]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[60]  Wolfgang Heiden,et al.  Topological analysis of complex molecular surfaces , 1992 .

[61]  M. Sanner,et al.  Reduced surface: an efficient way to compute molecular surfaces. , 1996, Biopolymers.

[62]  PROTEINS: Structure, Function, and Bioinformatics 54:557–562 (2004) Prediction of Protein Relative Solvent Accessibility with Support Vector Machines and Long-Range Interaction 3D Local Descriptor , 2022 .

[63]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.