Literature curation of protein interactions: measuring agreement across major public databases

Literature curation of protein interaction data faces a number of challenges. Although curators increasingly adhere to standard data representations, the data that various databases actually record from the same published information may differ significantly. Some of the reasons underlying these differences are well known, but their global impact on the interactions collectively curated by major public databases has not been evaluated. Here we quantify the agreement between curated interactions from 15 471 publications shared across nine major public databases. Results show that on average, two databases fully agree on 42% of the interactions and 62% of the proteins curated from the same publication. Furthermore, a sizable fraction of the measured differences can be attributed to divergent assignments of organism or splice isoforms, different organism focus and alternative representations of multi-protein complexes. Our findings highlight the impact of divergent curation policies across databases, and should be relevant to both curators and data consumers interested in analyzing protein-interaction data generated by the scientific community. Database URL: http://wodaklab.org/iRefWeb

[1]  Gary D Bader,et al.  BMC Biology BioMed Central , 2007 .

[2]  J. Manley,et al.  Functional interaction of BRCA1-associated BARD1 with polyadenylation factor CstF-50. , 1999, Science.

[3]  M. Vidal,et al.  Literature-curated protein interaction , 2009 .

[4]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[5]  Grant W. Brown,et al.  Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map , 2007, Nature.

[6]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[7]  Kumaran Kandasamy,et al.  An evaluation of human protein-protein interaction data in the public domain , 2006, BMC Bioinformatics.

[8]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[9]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[10]  Hans-Werner Mewes,et al.  MPact: the MIPS protein interaction resource on yeast , 2005, Nucleic Acids Res..

[11]  Ralf Herwig,et al.  ConsensusPathDB—a database for integrating human functional interaction networks , 2008, Nucleic Acids Res..

[12]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[13]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[14]  Henning Hermjakob,et al.  Submit Your Interaction Data the IMEx Way , 2007, Proteomics.

[15]  Carlos Prieto,et al.  APID: Agile Protein Interaction DataAnalyzer , 2006, Nucleic Acids Res..

[16]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[17]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[18]  M. Vidal,et al.  Addendum: Literature-curated protein interaction datasets , 2009, Nature Methods.

[19]  Zhilei Chen,et al.  A highly sensitive selection method for directed evolution of homing endonucleases , 2005, Nucleic acids research.

[20]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[21]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[22]  Luana Licata,et al.  Linking entries in protein interaction database to structured text: The FEBS Letters experiment , 2008, FEBS letters.

[23]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[24]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[25]  M. Vidal,et al.  Literature-curated protein interaction datasets , 2009, Nature Methods.

[26]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[27]  A. Valencia,et al.  Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge , 2008, Genome Biology.

[28]  Marc Vidal,et al.  Array MAPPIT: high-throughput interactome analysis in mammalian cells. , 2009, Journal of proteome research.

[29]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[30]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Keith D Wilkinson,et al.  BAP1: a novel ubiquitin hydrolase which binds to the BRCA1 RING finger and enhances BRCA1-mediated cell growth suppression , 1998, Oncogene.

[32]  Peter Woollard,et al.  The minimum information required for reporting a molecular interaction experiment (MIMIx) , 2007, Nature Biotechnology.

[33]  Ian M. Donaldson,et al.  iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence , 2010, Database J. Biol. Databases Curation.

[34]  Yang Ke,et al.  Human Papillomavirus 16 E6 Oncoprotein Interferences with Insulin Signaling Pathway by Binding to Tuberin* , 2004, Journal of Biological Chemistry.

[35]  Erich E. Wanker,et al.  UniHI 4: new tools for query, analysis and visualization of the human protein–protein interactome , 2008, Nucleic Acids Res..

[36]  P. Bork,et al.  Proteome Organization in a Genome-Reduced Bacterium , 2009, Science.

[37]  William Stafford Noble,et al.  Large-scale identification of yeast integral membrane protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Dmitrij Frishman,et al.  The MIPS mammalian protein?Cprotein interaction database , 2005, Bioinform..

[39]  S. Fields,et al.  The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[41]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[42]  Karl-Heinz Krause,et al.  BARD1 induces apoptosis by catalysing phosphorylation of p53 by DNA-damage response kinase , 2005, Oncogene.

[43]  Gary D. Bader,et al.  Pathguide: a Pathway Resource List , 2005, Nucleic Acids Res..

[44]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[45]  M. Ashburner,et al.  Calling on a million minds for community annotation in WikiProteins , 2008, Genome Biology.

[46]  Gary D Bader,et al.  Global Mapping of the Yeast Genetic Interaction Network , 2004, Science.

[47]  Jignesh M. Patel,et al.  Michigan molecular interactions r2: from interacting proteins to pathways , 2008, Nucleic Acids Res..

[48]  A. Valencia,et al.  A text‐mining perspective on the requirements for electronically annotated abstracts , 2008, FEBS letters.

[49]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[50]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[51]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[52]  Michael S. Livstone,et al.  Recurated protein interaction datasets , 2009, Nature Methods.

[53]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[54]  Anne-Claude Gavin,et al.  The social network of a cell: recent advances in interactome mapping. , 2008, Biotechnology annual review.

[55]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[56]  A. Fraser,et al.  Systematic mapping of genetic interactions in Caenorhabditis elegans identifies common modifiers of diverse signaling pathways , 2006, Nature Genetics.