Predicting Protein-Protein Interaction Sites From Amino Acid Sequence

We describe an approach for computational prediction of protein-protein interaction sites using a support vector machine (SVM) classifier. Interface residues and other surface residues were extracted from 115 proteins derived from a set of 70 heterocomplexes in PDB. The SVM classifier was trained to predict whether or not a surface residue is located in the interface based on the identity of the target residue and its 10 sequence neighbors. The effectiveness of the approach was evaluated using 115 leave-one-out cross validation (jack-knife) experiments. In each experiment, an SVM classifier was trained using a set of 1250 randomly chosen interface residues and an equal number of non-interface residues from 114 of the 115 molecules. The resulting classifier was used to classify surface residues from the remaining molecule into interface and non-interface residues. The classifier in each experiment was evaluated in terms of several performance measures. In results averaged over 115 experiments, interface residues and non-interface residues were identified with relatively high specificity (71%) and sensitivity (67%), and with a correlation coefficient of 0.29 between predicted and actual class labels, indicating that the method performs substantially better than chance (zero correlation). We also investigated the classifier's performance in terms of overall interactions site recognition. In 80% of the proteins, the classifier recognized the interaction surface by identifying at least half of the interface residues, and in 98% of the proteins, at least 20% of the interface residues were correctly identified. The success of this approach was confirmed by examination of predicted interfaces in the context of the three-dimensional structures of representative complexes. This study demonstrates that an SVM classifier can be used to predict whether or not a surface residue is an interface residue using amino acid sequence information. Because surface residues can be identified based on their solvent accessible surface area (ASA), given recent progress in computational methods for predicting ASA from sequence, the approach described in this paper provides a basis for computational prediction of interaction sites in proteins for which only amino acid sequence information is available.

[1]  H Naderi-Manesh,et al.  Prediction of protein surface accessibility with information theory. , 2000, Proteins.

[2]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[3]  R. M. Burnett,et al.  Distribution and complementarity of hydropathy in mutisunit proteins , 1991, Proteins.

[4]  C. Chothia,et al.  Principles of protein–protein recognition , 1975, Nature.

[5]  J M Thornton,et al.  Protein-protein interactions: a review of protein dimer structures. , 1995, Progress in biophysics and molecular biology.

[6]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[7]  D. Covell,et al.  A role for surface hydrophobicity in protein‐protein recognition , 1994, Protein science : a publication of the Protein Society.

[8]  R. Huber,et al.  The refined 2.4 A X‐ray crystal structure of recombinant human stefin B in complex with the cysteine proteinase papain: a novel type of proteinase inhibitor interaction. , 1990, The EMBO journal.

[9]  Jong H. Park,et al.  Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. , 2001, Journal of molecular biology.

[10]  J. Deisenhofer Crystallographic refinement and atomic models of a human Fc fragment and its complex with fragment B of protein A from Staphylococcus aureus at 2.9- and 2.8-A resolution. , 1981, Biochemistry.

[11]  M. Welch,et al.  Structure of the CheY-binding domain of histidine kinase CheA in complex with CheY , 1998, Nature Structural Biology.

[12]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[13]  H. Wolfson,et al.  Molecular surface complementarity at protein-protein interfaces: the critical role played by surface normals at well placed, sparse, points in docking. , 1995, Journal of molecular biology.

[14]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[15]  Renos Savva,et al.  Nucleotide mimicry in the crystal structure of the uracil-DNA glycosylase–uracil glycosylase inhibitor protein complex , 1995, Nature Structural Biology.

[16]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.

[17]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[18]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[19]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[20]  S A Benner,et al.  Bona fide prediction of aspects of protein conformation. Assigning interior and surface residues from patterns of variation and conservation in homologous protein sequences. , 1994, Journal of molecular biology.

[21]  Pierre Tufféry,et al.  PredAcc: prediction of solvent accessibility , 1999, Bioinform..

[22]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[23]  C. Chothia,et al.  The structure of protein-protein recognition sites. , 1990, The Journal of biological chemistry.

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Chris Sander,et al.  The HSSP database of protein structure-sequence alignments and family profiles , 1998, Nucleic Acids Res..

[26]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[27]  N. Ben-Tal,et al.  Residue frequencies and pairing preferences at protein–protein interfaces , 2001, Proteins.

[28]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[29]  C. Aflalo,et al.  Hydrophobic docking: A proposed enhancement to molecular recognition techniques , 1994, Proteins.

[30]  A. Thomas,et al.  A fast method to predict protein interaction sites from sequences. , 2000, Journal of molecular biology.

[31]  J Novotny,et al.  The crystal structure of the antibody N10-staphylococcal nuclease complex at 2.9 A resolution. , 1995, Journal of molecular biology.

[32]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[33]  Ruth Nussinov,et al.  Principles of docking: An overview of search algorithms and a guide to scoring functions , 2002, Proteins.

[34]  Hui Lu,et al.  MULTIPROSPECTOR: An algorithm for the prediction of protein–protein interactions by multimeric threading , 2002, Proteins.

[35]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[36]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[37]  J. Janin,et al.  Dissecting protein–protein recognition sites , 2002, Proteins.

[38]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[39]  D A Dougherty,et al.  Site-specific incorporation of biotinylated amino acids to identify surface-exposed residues in integral membrane proteins. , 1997, Chemistry & biology.

[40]  D. Eisenberg,et al.  Analysis of membrane and surface protein sequences with the hydrophobic moment plot. , 1984, Journal of molecular biology.

[41]  D. Haussler,et al.  Knowledge-based analysis of microarray gene expression , 2000 .

[42]  Sandor Vajda,et al.  Modeling of protein interactions in genomes , 2002, Proteins.

[43]  C Chothia,et al.  Surface, subunit interfaces and interior of oligomeric proteins. , 1988, Journal of molecular biology.

[44]  Joachim Mandler,et al.  ANTIGEN: protein surface residue prediction , 1988, Comput. Appl. Biosci..

[45]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.

[46]  L. Krippahl,et al.  BiGGER: A new (soft) docking algorithm for predicting protein interactions , 2000, Proteins.

[47]  Mehdi Sadeghi,et al.  Prediction of protein surface accessibility with information theory , 2001, Proteins.

[48]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[49]  R. Kini,et al.  Prediction of potential protein‐protein interaction sites from amino acid sequence , 1996, FEBS letters.

[50]  M. Sternberg,et al.  Modelling protein docking using shape complementarity, electrostatics and biochemical information. , 1997, Journal of molecular biology.

[51]  J. Thornton,et al.  Protein–protein interfaces: Analysis of amino acid conservation in homodimers , 2001, Proteins.

[52]  C. Chothia,et al.  Determination of protein function, evolution and interactions by structural genomics. , 2001, Current opinion in structural biology.

[53]  R. Huber,et al.  Refined structure of the hirudin-thrombin complex. , 1991, Journal of molecular biology.

[54]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[55]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[56]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[57]  A J Olson,et al.  Morphology of protein-protein interfaces. , 1998, Structure.

[58]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[59]  S H Kim,et al.  Predicting surface exposure of amino acids from protein sequence. , 1990, Protein engineering.

[60]  I. Kuntz,et al.  Protein docking and complementarity. , 1991, Journal of molecular biology.

[61]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .