Reliable Granular References to Changing Linked Data

Nanopublications are a concept to represent Linked Data in a granular and provenance-aware manner, which has been successfully applied to a number of scientific datasets. We demonstrated in previous work how we can establish reliable and verifiable identifiers for nanopublications and sets thereof. Further adoption of these techniques, however, was probably hindered by the fact that nanopublications can lead to an explosion in the number of triples due to auxiliary information about the structure of each nanopublication and repetitive provenance and metadata. We demonstrate here that this significant overhead disappears once we take the version history of nanopublication datasets into account, calculate incremental updates, and allow users to deal with the specific subsets they need. We show that the total size and overhead of evolving scientific datasets is reduced, and typical subsets that researchers use for their analyses can be referenced and retrieved efficiently with optimized precision, persistence, and reliability.

[1]  Michael Krauthammer,et al.  Decentralized provenance-aware publishing with nanopublications , 2016, PeerJ Prepr..

[2]  Arthur W. Toga,et al.  I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[3]  Michel Dumontier,et al.  Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data , 2014, ESWC.

[4]  Sören Auer,et al.  A Versioning and Evolution Framework for RDF Knowledge Bases , 2006, Ershov Memorial Conference.

[5]  Michael Krauthammer,et al.  Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data , 2014, SEMWEB.

[6]  Herbert Van de Sompel,et al.  An HTTP-Based Versioning Mechanism for Linked Data , 2010, LDOW.

[7]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[8]  Michel Dumontier,et al.  Making Digital Artifacts on the Web Verifiable and Reliable , 2015, IEEE Transactions on Knowledge and Data Engineering.

[9]  Núria Queralt-Rosinach,et al.  Publishing DisGeNET as Nanopublications , 2014 .

[10]  Dieter Van Uytvanck,et al.  Identification of Reproducible Subsets for Data Citation, Sharing and Re-Use , 2016, Bull. IEEE Tech. Comm. Digit. Libr..

[11]  Chris T. A. Evelo,et al.  Reactome from a WikiPathways Perspective , 2016, PLoS Comput. Biol..

[12]  Barend Mons,et al.  Converting neXtProt into Linked Data and nanopublications , 2015, Semantic Web.

[13]  Michel Dumontier,et al.  Provenance-Centered Dataset of Drug-Drug Interactions , 2015, SEMWEB.

[14]  Rik Van de Walle,et al.  R&Wbase: git for triples , 2013, LDOW.

[15]  Gianmaria Silvello A Methodology for Citing Linked Open Data Subsets , 2015, D Lib Mag..

[16]  Natanael Arndt,et al.  Towards Versioning of Arbitrary RDF Data , 2016, SEMANTICS.

[17]  Yvonne M. Socha,et al.  OUT OF CITE, OUT OF MIND: THE CURRENT STATE OF PRACTICE, POLICY, AND TECHNOLOGY FOR THE CITATION OF DATA CODATA-ICSTI Task Group on Data Citation Standards and Practices , 2013 .

[18]  Elaine Shi,et al.  Permacoin: Repurposing Bitcoin Work for Data Preservation , 2014, 2014 IEEE Symposium on Security and Privacy.

[19]  York Sure,et al.  SemVersion: A Versioning System for RDF and Ontologies , 2005 .

[20]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[21]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[22]  Michael Krauthammer,et al.  Broadening the Scope of Nanopublications , 2013, ESWC.

[23]  Ryan Miller,et al.  WikiPathways: capturing the full diversity of pathway knowledge , 2015, Nucleic Acids Res..

[24]  Núria Queralt-Rosinach,et al.  DisGeNET-RDF: harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases , 2015, bioRxiv.

[25]  Jürgen Umbrich,et al.  Towards Efficient Archiving of Dynamic Linked Open Data , 2015, DIACRON@ESWC.

[26]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[27]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[28]  Rob W.W. Hooft,et al.  The value of data , 2011, Nature Genetics.

[29]  Joseph Paul Cohen,et al.  Academic Torrents: A Community-Maintained Distributed Repository , 2014, XSEDE '14.

[30]  Jürgen Umbrich,et al.  Observing Linked Data Dynamics , 2013, ESWC.

[31]  Leon Urbas,et al.  R43ples: Revisions for Triples - An Approach for Version Control in the Semantic Web , 2014, LDQ@SEMANTICS.

[32]  Ryan Miller,et al.  Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources , 2016, PLoS Comput. Biol..

[33]  Yannis Tzitzikas,et al.  On Storage Policies for Semantic Web Repositories That Support Versioning , 2008, ESWC.

[34]  Bernhard Schandl Replication and Versioning of Partial RDF Graphs , 2010, ESWC.

[35]  Paul T. Groth,et al.  Provenance: An Introduction to PROV , 2013, Provenance.

[36]  Tobias Kuhn,et al.  nanopub-java: A Java Library for Nanopublications , 2015, LISC@ISWC.

[37]  Harald Sack,et al.  TailR: a platform for preserving history on the web of data , 2015, SEMANTICS.