Converting neXtProt into Linked Data and nanopublications

The development of Linked Data provides the opportunity for databases to supply extensive volumes of biological data, information, and knowledge in a machine interpretable format to make previously isolated data silos interoperable. To increase ease of use, often databases incorporate annotations from several different resources. Linked Data can overcome many formatting and identifier issues that prevent data interoperability, but the extensive cross incorporation of annotations between databases makes the tracking of provenance in open, decentralized systems especially important. With the diversity of published data, provenance information becomes critical to providing reliable and trustworthy services to scientists. The nanopublication system addresses many of these challenges. We have developed the neXtProt Linked Data by serializing in RDF/XML annotations specific to neXtProt and started employing the nanopublication model to give appropriate attribution to all data. Specifically, a use case demonstrates the handling of post-translational modification (PTM) data modeled as nanopublications to illustrate the how the different levels of provenance and data quality thresholds can be captured in this model.

[1]  Jeremy J. Carroll,et al.  Named graphs, provenance and trust , 2005, WWW '05.

[2]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[3]  A. Poustka,et al.  Systematic subcellular localization of novel proteins identified by large‐scale cDNA sequencing , 2000, EMBO reports.

[4]  Jeremy J. Carroll,et al.  Named graphs , 2005, J. Web Semant..

[5]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[6]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[8]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[9]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[10]  Ibrahim Emam,et al.  ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments , 2010, Nucleic Acids Res..

[11]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[12]  Uri Alon,et al.  Generation of a fluorescently labeled endogenous protein library in living human cells , 2007, Nature Protocols.

[13]  Zhiyong Lu,et al.  Database resources of the National Center for Biotechnology Information , 2010, Nucleic Acids Res..

[14]  A. Poustka,et al.  A microscope‐based screening platform for large‐scale functional protein analysis in intact cells , 2003, FEBS letters.

[15]  Peter B. McGarvey,et al.  Infrastructure for the life sciences: design and implementation of the UniProt website , 2009, BMC Bioinformatics.

[16]  E. Lundberg,et al.  Towards a knowledge-based Human Protein Atlas , 2010, Nature Biotechnology.

[17]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[18]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  Jo McEntyre,et al.  The NCBI Handbook , 2002 .

[21]  Amos Bairoch,et al.  neXtProt: a knowledge platform for human proteins , 2011, Nucleic Acids Res..

[22]  R. Aebersold,et al.  A High-Confidence Human Plasma Proteome Reference Set with Estimated Concentrations in PeptideAtlas* , 2011, Molecular & Cellular Proteomics.

[23]  Kei-Hoi Cheung,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[24]  B. Mons,et al.  Nano-Publication in the e-science era , 2009 .

[25]  A. Bairoch,et al.  neXtProt: organizing protein knowledge in the context of human proteome projects. , 2013, Journal of proteome research.

[26]  Rob W.W. Hooft,et al.  The value of data , 2011, Nature Genetics.

[27]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..