Identification of Surface Residues Involved in Protein-Protein Interaction — A Support Vector Machine Approach

We describe a machine learning approach for sequence-based prediction of protein-protein interaction sites. . A support vector machine (SVM) classifier was trained to predict whether or not a surface residue is an interface residue (i.e., is located in the protein-protein interaction surface) based on the identity of the target residue and its 10 sequence neighbors. Separate classifiers were trained on proteins from two categories of complexes, antibody-antigen and protease-inhibitor. The effectiveness of each classifier was evaluated using leave-one-out (jack-knife) cross-validation. Interface and non-interface residues were classified with relatively high sensitivity (82.3% and 78.5%) and specificity (81.0% and 77.6%) for proteins in the antigen-antibody and protease inhibitor complexes, respectively. The correlation between predicted and actual labels was 0.430 and 0.462, indicating that the method performs substantially better than chance (zero correlation). Combined with recently developed methods for identification of surface residues from sequence information, this offers a promising approach to prediction of residues involved in protein-protein interaction from sequence information alone.

[1]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[2]  S H Kim,et al.  Predicting surface exposure of amino acids from protein sequence. , 1990, Protein engineering.

[3]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[4]  Vasant Honavar,et al.  Predicting Protein-Protein Interaction Sites From Amino Acid Sequence , 2002 .

[5]  D A Dougherty,et al.  Site-specific incorporation of biotinylated amino acids to identify surface-exposed residues in integral membrane proteins. , 1997, Chemistry & biology.

[6]  R. Poljak,et al.  Crystal structure of an Fv-Fv idiotope-anti-idiotope complex at 1.9 A resolution. , 1996, Journal of molecular biology.

[7]  Paolo Ascenzi,et al.  Crystal and molecular structure of the bovine α-chymotrypsin-eglin c complex at 2.0 Å resolution☆ , 1992 .

[8]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[9]  John Platt,et al.  Fast training of svms using sequential minimal optimization , 1998 .

[10]  S A Benner,et al.  Bona fide prediction of aspects of protein conformation. Assigning interior and surface residues from patterns of variation and conservation in homologous protein sequences. , 1994, Journal of molecular biology.

[11]  Joachim Mandler,et al.  ANTIGEN: protein surface residue prediction , 1988, Comput. Appl. Biosci..

[12]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001, Proteins.

[13]  A. Thomas,et al.  A fast method to predict protein interaction sites from sequences. , 2000, Journal of molecular biology.

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[15]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[16]  Chris Sander,et al.  The HSSP database of protein structure-sequence alignments and family profiles , 1998, Nucleic Acids Res..

[17]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[18]  C. Chothia,et al.  Determination of protein function, evolution and interactions by structural genomics. , 2001, Current opinion in structural biology.

[19]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[20]  Y. Katsube,et al.  Crystal structure of an elastase-specific inhibitor elafin complexed with porcine pancreatic elastase determined at 1.9 A resolution. , 1996, Biochemistry.

[21]  R. Kini,et al.  Prediction of potential protein‐protein interaction sites from amino acid sequence , 1996, FEBS letters.

[22]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[23]  H Naderi-Manesh,et al.  Prediction of protein surface accessibility with information theory. , 2000, Proteins.

[24]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[25]  J Novotny,et al.  The crystal structure of the antibody N10-staphylococcal nuclease complex at 2.9 A resolution. , 1995, Journal of molecular biology.

[26]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Hui Lu,et al.  MULTIPROSPECTOR: An algorithm for the prediction of protein–protein interactions by multimeric threading , 2002, Proteins.

[28]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[29]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.

[30]  Pierre Tufféry,et al.  PredAcc: prediction of solvent accessibility , 1999, Bioinform..

[31]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.