Decentralized provenance-aware publishing with nanopublications

Publication and archival of scientific results is still commonly considered the responsability of classical publishing companies. Classical forms of publishing, however, which center around printed narrative articles, no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. In this article, we propose to design scientific data publishing as a web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used as a low-level data publication layer to serve the Semantic Web in general. Our evaluation of the current network shows that this system is efficient and reliable.

[1]  Tobias Kuhn,et al.  A Survey and Classification of Controlled Natural Languages , 2014, CL.

[2]  Michael Krauthammer,et al.  Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications , 2013, DILS.

[3]  Rob W.W. Hooft,et al.  The value of data , 2011, Nature Genetics.

[4]  Michel Dumontier,et al.  Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data , 2014, ESWC.

[5]  D. Cooper,et al.  Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain , 2012, Human mutation.

[6]  Andreas Rauber,et al.  A Scalable Framework for Dynamic Data Citation of Arbitrary Structured Data , 2014, DATA.

[7]  Tobias Kuhn,et al.  nanopub-java: A Java Library for Nanopublications , 2015, LISC@ISWC.

[8]  Van Jacobson,et al.  Networking named content , 2009, CoNEXT '09.

[9]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[10]  Chris Markman,et al.  BitTorrent and Libraries: Cooperative Data Publishing, Management and Discovery , 2014, D Lib Mag..

[11]  Andreas Harth,et al.  CumulusRDF: Linked Data Management on Nested Key-Value Stores , 2011 .

[12]  Jürgen Umbrich,et al.  SPARQL Web-Querying Infrastructure: Ready for Action? , 2013, SEMWEB.

[13]  Olaf Hartig,et al.  An Overview on Execution Strategies for Linked Data Queries , 2013, Datenbank-Spektrum.

[14]  Michel Dumontier,et al.  Provenance-Centered Dataset of Drug-Drug Interactions , 2015, SEMWEB.

[15]  Gary D. Bader,et al.  Dataset Descriptions: HCLS Community Profile , 2015 .

[16]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[17]  David Mazières,et al.  Fast and secure distributed read-only file system , 2000, TOCS.

[18]  Núria Queralt-Rosinach,et al.  Publishing DisGeNET as Nanopublications , 2014 .

[19]  Jeremy J. Carroll,et al.  Named graphs, provenance and trust , 2005, WWW '05.

[20]  Daniel P. Miranker,et al.  On directly mapping relational databases to RDF and OWL , 2012, WWW.

[21]  Oscar Corcho,et al.  Workflow-centric research objects: First class citizens in scholarly discourse. , 2012, ESWC 2012.

[22]  Ryan Shaw,et al.  Nanopublication beyond the Sciences , 2015, PeerJ Prepr..

[23]  Paul T. Groth,et al.  Querying neXtProt nanopublications and their value for insights on sequence variants and tissue expression , 2014, J. Web Semant..

[24]  Tobias Kuhn Science Bots: A Model for the Future of Scientific Computation? , 2015, WWW.

[25]  Rik Van de Walle,et al.  Querying Datasets on the Web with High Availability , 2014, SEMWEB.

[26]  John Bradley,et al.  Documents and Data: Modelling Materials for Humanities Research in XML and Relational Databases , 2005, Lit. Linguistic Comput..

[27]  Norman Paskin,et al.  Digital Object Identifiers for scientific data , 2005, Data Sci. J..

[28]  Elaine Shi,et al.  Permacoin: Repurposing Bitcoin Work for Data Preservation , 2014, 2014 IEEE Symposium on Security and Privacy.

[29]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[30]  Norbert E. Fuchs,et al.  Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions , 2006, DILS.

[31]  Michael Krauthammer,et al.  Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data , 2014, SEMWEB.

[32]  Michel Dumontier,et al.  Making Digital Artifacts on the Web Verifiable and Reliable , 2015, IEEE Transactions on Knowledge and Data Engineering.

[33]  Barend Mons,et al.  Converting neXtProt into Linked Data and nanopublications , 2015, Semantic Web.

[34]  Joseph Paul Cohen,et al.  Academic Torrents: A Community-Maintained Distributed Repository , 2014, XSEDE '14.

[35]  Ryan Shaw,et al.  Nanopublication beyond the sciences: the PeriodO period gazetteer , 2016, PeerJ Comput. Sci..

[36]  Françoise Baude,et al.  A Survey of Structured P2P Systems for RDF Data Storage and Retrieval , 2011, Trans. Large Scale Data Knowl. Centered Syst..

[37]  Michael Krauthammer,et al.  Broadening the Scope of Nanopublications , 2013, ESWC.

[38]  Timothy W. Finin,et al.  RDF123: From Spreadsheets to RDF , 2008, SEMWEB.

[39]  Richard Freedman The Renaissance chanson goes digital: digitalduchemin.org , 2014 .