prediction of protein-protein interactions

Knowledge of protein‐protein interaction is useful for elucidating protein function via the concept of ‘guilt‐by‐association’. A statistical learning method, Support Vector Machine (SVM), has recently been explored for the prediction of protein‐protein interactions using artificial shuffled sequences as hypothetical noninteracting proteins and it has shown promising results (Bock, J. R., Gough, D. A., Bioinformatics 2001, 17, 455–460). It remains unclear however, how the prediction accuracy is affected if real protein sequences are used to represent noninteracting proteins. In this work, this effect is assessed by comparison of the results derived from the use of real protein sequences with that derived from the use of shuffled sequences. The real protein sequences of hypothetical noninteracting proteins are generated from an exclusion analysis in combination with subcellular localization information of interacting proteins found in the Database of Interacting Proteins. Prediction accuracy using real protein sequences is 76.9% compared to 94.1% using artificial shuffled sequences. The discrepancy likely arises from the expected higher level of difficulty for separating two sets of real protein sequences than that for separating a set of real protein sequences from a set of artificial sequences. The use of real protein sequences for training a SVM classification system is expected to give better prediction results in practical cases. This is tested by using both SVM systems for predicting putative protein partners of a set of thioredoxin related proteins. The prediction results are consistent with observations, suggesting that real sequence is more practically useful in development of SVM classification system for facilitating protein‐protein interaction prediction.

[1]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[2]  M. Tsujimoto,et al.  Overexpression of thioredoxin reductase 1 regulates NF‐κB activation , 2004 .

[3]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[4]  J. Fernandez-Checa,et al.  Redox regulation and signaling lipids in mitochondrial apoptosis. , 2003, Biochemical and biophysical research communications.

[5]  V. Bunik 2-Oxo acid dehydrogenase complexes in redox regulation. , 2003, European journal of biochemistry.

[6]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[7]  S. Pang,et al.  Gene expression profiling of androgen deficiency predicts a pathway of prostate apoptosis that involves genes related to oxidative stress. , 2002, Endocrinology.

[8]  Kevin Burrage,et al.  Prediction of protein solvent accessibility using support vector machines , 2002, Proteins.

[9]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[10]  John Yu,et al.  Thioredoxin‐Related Regulation of NO/NOS Activities , 2002, Annals of the New York Academy of Sciences.

[11]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[12]  M. Sternberg,et al.  Prediction of protein-protein interactions by docking methods. , 2002, Current opinion in structural biology.

[13]  M. Gold,et al.  CD40 Signaling in B Cells Regulates the Expression of the Pim-1 Kinase Via the NF-κB Pathway1 , 2002, The Journal of Immunology.

[14]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[15]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[16]  Gary D Bader,et al.  Systematic Genetic Analysis with Ordered Arrays of Yeast Deletion Mutants , 2001, Science.

[17]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[19]  T Pawson,et al.  SH2 domains, interaction modules and cellular wiring. , 2001, Trends in cell biology.

[20]  R. Czerminski,et al.  Use of Support Vector Machine in Pattern Classification: Application to QSAR Studies , 2001 .

[21]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[22]  Huan‐Xiang Zhou,et al.  Prediction of protein interaction sites from sequence profile and residue neighbor list , 2001 .

[23]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[24]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[25]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[26]  Elias S. J. Arnér,et al.  Physiological functions of thioredoxin and thioredoxin reductase. , 2000, European journal of biochemistry.

[27]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[28]  S. Oliver Proteomics: Guilt-by-association goes global , 2000, Nature.

[29]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Eivind Coward,et al.  Shufflet: shuffling sequences while conserving the k-let counts , 1999, Bioinform..

[31]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[32]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[33]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[34]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[35]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[36]  M. Pittelkow,et al.  Cytotoxicity of 6-biopterin to human melanocytes. , 1994, Biochemical and biophysical research communications.

[37]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[38]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[39]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[40]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[41]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[42]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[43]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .