Amino-Acid Residue Association Models for Large Scale Protein-Protein Interaction Prediction

UNLABELLED The computational prediction of protein-protein interactions (PPI) is an essential complement to direct experimental evidence. Traditional approaches rely on less available or computationally predicted surface properties, show database-specific performances and are computationally expensive for large-scale datasets. Several sensitivity and specificity issues remain. Here, we report a novel method based on 'Amino-acid Residue Associations' (ARA) among interacting proteins which utilizes the accurate and easily available primary sequence. Large scale PPI datasets for six model species (from E. coli to human) were studied. The ARA method shows up to 73%sensitivity and 78% specificity. Furthermore, the method performs remarkably well in terms of stability and generalizability. The performance of ARA method benchmarked against existing prediction techniques shows performance improvement upto 25%. Ability of ARA method to predict PPI across species and across databases is also demonstrated. Overall, the ARA method provides a significant improvement over existing ones in correctly identifying large scale protein-protein interactions,irrespective of the data resource, network size or organism. AVAILABILITY The MATLAB code for ARA approach will be made available upon request.

[1]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[2]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[3]  Arun K. Ramani,et al.  Protein interaction networks from yeast to human. , 2004, Current opinion in structural biology.

[4]  Juan Cui,et al.  Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity , 2006, Proteomics.

[5]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[6]  Mark Pagel,et al.  Predicting functional gene-links from phylogenetic-statistical analyses of whole genomes , 2005, CSB Workshops.

[7]  M. Vidal,et al.  Protein interaction mapping in C. elegans using proteins involved in vulval development. , 2000, Science.

[8]  Rao Raghuraj,et al.  VPMCD: Variable interaction modeling approach for class discrimination in biological systems , 2007, FEBS letters.

[9]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Frederick P. Roth,et al.  Predicting co-complexed protein pairs using genomic and proteomic data integration , 2004, BMC Bioinformatics.

[12]  J. Skolnick,et al.  Prediction of physical protein–protein interactions , 2005, Physical biology.

[13]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[14]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[15]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[16]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[17]  Patrick Aloy,et al.  Interrogating protein interaction networks through structural biology , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  M. Gerstein,et al.  Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. , 2004, Genome research.

[19]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[20]  M. Pellegrini,et al.  Protein Interaction Networks , 2004, Expert review of proteomics.

[21]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[22]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[23]  Farshad Fotouhi,et al.  Computational Approaches for Predicting Protein–Protein Interactions: A Survey , 2006, Journal of Medical Systems.

[24]  Richard Gonzalez,et al.  Correlational analysis of dyad-level data in the exchangeable case. , 1995 .