Probabilistic prediction and ranking of human protein-protein interactions

BackgroundAlthough the prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. Furthermore, it has recently been estimated that the overall average false positive rate of available computational and high-throughput experimental interaction datasets is as high as 90%.ResultsThe prediction of human protein-protein interactions was investigated by combining orthogonal protein features within a probabilistic framework. The features include co-expression, orthology to known interacting proteins and the full-Bayesian combination of subcellular localization, co-occurrence of domains and post-translational modifications. A novel scoring function for local network topology was also investigated. This topology feature greatly enhanced the predictions and together with the full-Bayes combined features, made the largest contribution to the predictions. Using a conservative threshold, our most accurate predictor identifies 37606 human interactions, 32892 (80%) of which are not present in other publicly available large human interaction datasets, thus substantially increasing the coverage of the human interaction map. A subset of the 32892 novel predicted interactions have been independently validated. Comparison of the prediction dataset to other available human interaction datasets estimates the false positive rate of the new method to be below 80% which is competitive with other methods. Since the new method scores and ranks all human protein pairs, smaller subsets of higher quality can be generated thus leading to even lower false positive prediction rates.ConclusionThe set of interactions predicted in this work increases the coverage of the human interaction map and will help determine the highest confidence human interactions.

[1]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[2]  T. Galli,et al.  A novel tetanus neurotoxin-insensitive vesicle-associated membrane protein in SNARE complexes of the apical plasma membrane of epithelial cells. , 1998, Molecular biology of the cell.

[3]  Jong H. Park,et al.  Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. , 2001, Journal of molecular biology.

[4]  Wing-Kin Sung,et al.  Probabilistic prediction of protein-protein interactions from the protein sequences , 2006, Comput. Biol. Medicine.

[5]  Bor Luen Tang,et al.  Early/recycling endosomes-to-TGN transport involves two SNARE complexes and a Rab6 isoform , 2002, The Journal of cell biology.

[6]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Nir Friedman,et al.  Towards an Integrated Protein-Protein Interaction Network: A Relational Markov Network Approach , 2006, J. Comput. Biol..

[8]  Matthew A. Hibbs,et al.  Discovery of biological networks from diverse functional genomic data , 2005, Genome Biology.

[9]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[10]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[11]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[12]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[13]  Debasis Dash,et al.  Role of intrinsic disorder in transient interactions of hub proteins , 2006, Proteins.

[14]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[15]  Burkhard Rost,et al.  Protein–Protein Interactions More Conserved within Species than across Species , 2006, PLoS Comput. Biol..

[16]  R. Chanet,et al.  Protein interaction mapping: a Drosophila case study. , 2005, Genome research.

[17]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[18]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[19]  Jonathan Lim,et al.  Ulysses - an application for the projection of molecular interactions across species , 2005, Genome Biology.

[20]  P. Bork,et al.  Structure-Based Assembly of Protein Complexes in Yeast , 2004, Science.

[21]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[22]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[23]  B. Tang,et al.  A 29-Kilodalton Golgi SolubleN-Ethylmaleimide-sensitive Factor Attachment Protein Receptor (Vti1-rp2) Implicated in Protein Trafficking in the Secretory Pathway* , 1998, The Journal of Biological Chemistry.

[24]  Dong Dong,et al.  IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model , 2006, BMC Bioinformatics.

[25]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[26]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[27]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[28]  A. Barabasi,et al.  A Protein–Protein Interaction Network for Human Inherited Ataxias and Disorders of Purkinje Cell Degeneration , 2006, Cell.

[29]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[30]  K. N. Chandrika,et al.  Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets , 2006, Nature Genetics.

[31]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[32]  Fumio Hanaoka,et al.  Site‐specific phosphorylation of MCM4 during the cell cycle in mammalian cells , 2006, The FEBS journal.

[33]  Razvan C. Bunescu,et al.  Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome , 2005, Genome Biology.

[34]  J. Skolnick,et al.  Prediction of physical protein–protein interactions , 2005, Physical biology.

[35]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[36]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[37]  T. Ideker,et al.  Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae , 2006, Journal of biology.

[38]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[39]  J. Wojcik,et al.  Functional proteomics mapping of a human signaling pathway. , 2004, Genome research.

[40]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[41]  Mark Gerstein,et al.  Analyzing cellular biochemistry in terms of molecular networks. , 2003, Annual review of biochemistry.

[42]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[43]  Marie-France Carlier,et al.  IQGAP1 Stimulates Actin Assembly through the N-Wasp-Arp2/3 Pathway* , 2007, Journal of Biological Chemistry.

[44]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[45]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[46]  Paul A. Bates,et al.  Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis , 2006, BMC Bioinformatics.

[47]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[48]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[49]  A. Fraser,et al.  A first-draft human protein-interaction map , 2004, Genome Biology.

[50]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[51]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[52]  Geoffrey J. Barton,et al.  SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein–Protein Interactions , 2007, Nucleic Acids Res..

[53]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[54]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[55]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[56]  George M. Church,et al.  Estimating and improving protein interaction error rates , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[57]  T. Gaasterland,et al.  Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. , 1998, Microbial & comparative genomics.

[58]  Michelle S. Scott,et al.  Predicting subcellular localization via protein motif co-occurrence. , 2004, Genome research.

[59]  M. Vidal,et al.  Protein interaction mapping in C. elegans using proteins involved in vulval development. , 2000, Science.

[60]  M. Vidal,et al.  Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs". , 2001, Genome research.

[61]  A. Emili,et al.  Interaction network containing conserved and essential protein complexes in Escherichia coli , 2005, Nature.

[62]  Matthew A. Hibbs,et al.  Finding function: evaluation methods for functional genomic data , 2006, BMC Genomics.

[63]  M. Tyers,et al.  The GRID: The General Repository for Interaction Datasets , 2003, Genome Biology.

[64]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[65]  M. Gerstein,et al.  Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. , 2004, Genome research.

[66]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[67]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[68]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[69]  I. Lossos,et al.  T-Cell Protein Tyrosine Phosphatase, Distinctively Expressed in Activated-B-Cell-Like Diffuse Large B-Cell Lymphomas, Is the Nuclear Phosphatase of STAT6 , 2007, Molecular and Cellular Biology.

[70]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[71]  Zoran Obradovic,et al.  Length-dependent prediction of protein intrinsic disorder , 2006, BMC Bioinformatics.

[72]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[73]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[74]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[75]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[76]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[77]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.