Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair.

Protein-protein interaction plays a critical role in biological processes. The identification of interacting proteins by computational methods can provide new leads in functional studies of uncharacterized proteins without performing extensive experiments. We developed a database for the potentially interacting domain pairs (PID) extracted from a dataset of experimentally identified interacting protein pairs (DIP: database of interacting proteins) with InterPro, an integrated database of protein families, domains and functional sites. In developing protein interaction databases and predictive methods, sensitive statistical scoring systems is critical to provide a reliability index for accurate functional analysis of interaction networks. We present a statistical scoring system, named "PID matrix score" as a measure of the interaction probability (interactability) between domains. This system provided a valuable tool for functional prediction of unknown proteins. For the evaluation of PID matrix, cross validation was performed with subsets of DIP data. The prediction system gives about 50% sensitivity and more than 98% specificity, which implies that the information for interacting proteins pairs could be enriched about 30 fold with the PID matrix. It is demonstrated that mapping of the genome-wide interaction network can be achieved by using the PID matrix.

[1]  Jong H. Park,et al.  Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. , 2001, Journal of molecular biology.

[2]  J. Wojcik,et al.  The protein–protein interaction map of Helicobacter pylori , 2001, Nature.

[3]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[4]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[5]  Jérôme Wojcik,et al.  Protein-protein interaction map inference using interacting domain profile pairs , 2001, ISMB.

[6]  D. Eisenberg,et al.  Protein interaction databases. , 2001, Current opinion in biotechnology.

[7]  Oliver Niggemann,et al.  Generating protein interaction maps from incomplete data: application to fold assignment , 2001, ISMB.

[8]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[10]  E. Marcotte,et al.  Computational genetics: finding protein function by nonhomology methods. , 2000, Current opinion in structural biology.

[11]  M. Takiguchi,et al.  Structure of the rat argininosuccinate lyase gene: close similarity to chicken delta-crystallin genes. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[12]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[13]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[14]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[15]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[16]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[17]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[18]  S. Fields,et al.  Genome-wide analysis of vaccinia virus protein-protein interactions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[19]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[20]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[23]  D. Bolser,et al.  Conservation of protein interaction network in evolution. , 2001, Genome informatics. International Conference on Genome Informatics.