Making Digital Artifacts on the Web Verifiable and Reliable

The current Web has no general mechanisms to make digital artifacts-such as datasets, code, texts, and images-verifiable and permanent. For digital artifacts that are supposed to be immutable, there is moreover no commonly accepted method to enforce this immutability. These shortcomings have a serious negative impact on the ability to reproduce the results of processes that rely on Web resources, which in turn heavily impacts areas such as science where reproducibility is important. To solve this problem, we propose trusty URIs containing cryptographic hash values. We show how trusty URIs can be used for the verification of digital artifacts, in a manner that is independent of the serialization format in the case of structured data files such as nanopublications. We demonstrate how the contents of these files become immutable, including dependencies to external digital artifacts and thereby extending the range of verifiability to the entire reference tree. Our approach sticks to the core principles of the Web, namely openness and decentralized architecture, and is fully compatible with existing standards and protocols. Evaluation of our reference implementations shows that these design goals are indeed accomplished by our approach, and that it remains practical even for very large files.

[1]  Michael Krauthammer,et al.  Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data , 2014, SEMWEB.

[2]  Alan H. Karp,et al.  Computing the digest of an RDF graph , 2004 .

[3]  Herbert Van de Sompel,et al.  Persistent Identifiers for Scholarly Assets and the Web: The Need for an Unambiguous Mapping , 2014, Int. J. Digit. Curation.

[4]  Sean Bechhofer,et al.  Research Objects: Towards Exchange and Reuse of Digital Knowledge , 2010 .

[5]  Michel Dumontier,et al.  Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data , 2013, ESWC.

[6]  Stephen Farrell,et al.  Naming Things with Hashes , 2013, RFC.

[7]  Michel Dumontier,et al.  Ontology-Based Querying with Bio2RDF’s Linked Open Data , 2013, Journal of Biomedical Semantics.

[8]  Robert Gentleman,et al.  Statistical Applications in Genetics and Molecular Biology , 2005 .

[9]  Mark Bartel,et al.  Xml-Signature Syntax and Processing , 2000 .

[10]  Brian A. Nosek,et al.  An Open, Large-Scale, Collaborative Effort to Estimate the Reproducibility of Psychological Science , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[11]  David A. Wagner,et al.  Security considerations for incremental hash functions based on pair block chaining , 2006, Comput. Secur..

[12]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[13]  Ina Schieferdecker,et al.  Hashing of RDF Graphs and a Solution to the Blank Node Problem , 2014, URSW.

[14]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[15]  Deborah L. McGuinness,et al.  Functional Requirements for Information Resource Provenance on the Web , 2012, IPAW.

[16]  Michel Dumontier,et al.  Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data , 2014, ESWC.

[17]  Michael Krauthammer,et al.  Broadening the Scope of Nanopublications , 2013, ESWC.

[18]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[19]  Mihir Bellare,et al.  Incremental Cryptography: The Case of Hashing and Signing , 1994, CRYPTO.

[20]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[21]  Jeremy J. Carroll,et al.  Signing RDF Graphs , 2003, SEMWEB.

[22]  Deborah L. McGuinness,et al.  Parallel Identities for Managing Open Government Data , 2012, IEEE Intelligent Systems.

[23]  Micah Altman,et al.  A Proposed Standard for the Scholarly Citation of Quantitative Data , 2008, IASSIST Conference.