Predicting protein-protein interactions from primary structure

MOTIVATION An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. The expectation is that this will provide a fuller appreciation of cellular processes and networks at the protein level, ultimately leading to a better understanding of disease mechanisms and suggesting new means for intervention. This paper addresses the question: can protein-protein interactions be predicted directly from primary structure and associated data? Using a diverse database of known protein interactions, a Support Vector Machine (SVM) learning system was trained to recognize and predict interactions based solely on primary structure and associated physicochemical properties. RESULTS Inductive accuracy of the trained system, defined here as the percentage of correct protein interaction predictions for previously unseen test sets, averaged 80% for the ensemble of statistical experiments. Future proteomics studies may benefit from this research by proceeding directly from the automated identification of a cell's gene products to prediction of protein interaction pairs.

[1]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[2]  M. Koshiba,et al.  Practical Quantum Cryptography: A Comprehensive Analysis (Part One) , 2000, quant-ph/0009027.

[3]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[5]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[6]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[7]  W. Blackstock,et al.  Proteomics: quantitative and physical mapping of cellular proteins. , 1999, Trends in biotechnology.

[8]  T. Pawson,et al.  Signaling through scaffold, anchoring, and adaptor proteins. , 1997, Science.

[9]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[10]  R. Kini,et al.  Prediction of potential protein‐protein interaction sites from amino acid sequence , 1996, FEBS letters.

[11]  Ruurd van der Zee,et al.  Prediction of sequential antigenic regions in proteins , 1985, FEBS letters.

[12]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Alfonso Martinez Arias,et al.  Molecular biology of the cell (2nd edn): edited by B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts and J.D, Watson, Garland, 1989 $46.95 (v + 1187 pages) ISBN 0 8240 3695 6 , 1989 .

[14]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[15]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[16]  Peter Winkler,et al.  Shuffling Biological Sequences , 1996, Discret. Appl. Math..

[17]  Raman Nambudripad,et al.  The ancient regulatory-protein family of WD-repeat proteins , 1994, Nature.

[18]  Petricoin Ef rd,et al.  The promise of proteomics , 2003, Nature.

[19]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[20]  D. Barford,et al.  The structure of the tetratricopeptide repeats of protein phosphatase 5: implications for TPR‐mediated protein–protein interactions , 1998, The EMBO journal.

[21]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[22]  Eivind Coward,et al.  Shufflet: shuffling sequences while conserving the k-let counts , 1999, Bioinform..

[23]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[24]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.

[25]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[26]  C. Ponting,et al.  Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. , 1999, Journal of molecular biology.

[27]  Temple F. Smith,et al.  The ancient regulatory-protein family of WD-repeat proteins , 1994, Nature.

[28]  Tony Pawson,et al.  Protein modules and signalling networks , 1995, Nature.

[29]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[30]  W. Fitch Random sequences. , 1983, Journal of molecular biology.

[31]  Alan Dove,et al.  Proteomics: translating genomics into products? , 1999, Nature Biotechnology.

[32]  Stanley Fields,et al.  The yeast two-hybrid system , 1997 .

[33]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[34]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[35]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[36]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[37]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[38]  M. S. Brown,et al.  Support Vector Machine Classification of Microarray from Gene Expression Data , 1999 .

[39]  James M. Anderson,et al.  Protein–protein interactions: PDZ domain networks , 1996, Current Biology.

[40]  B. Cullen,et al.  Identification of a novel human zinc finger protein that specifically interacts with the activation domain of lentiviral Tat proteins. , 1995, Virology.

[41]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.