Capturing the Age of Linked Open Data: Towards a Dataset-Independent Framework

An increasing amount of data are published and consumed on the Web according to the Linked Data paradigm. In such scenario, understanding if the data consumed are up-to-date is crucial. Outdated data are usually considered inappropriate for many crucial tasks, such as make the consumer confident that answers returned to a query are still valid at the time the query is formulated. In this paper we present a first dataset-independent framework for assessing currency of Linked Open Data (LOD) graphs. Starting from the analysis of the 8,713,282 triples containing temporal metadata in the billion triple challenge 2011, we investigate which vocabularies are used to represent versioning metadata, we defined Onto Currency, an ontology that integrates the most frequent properties used in this domain, and supports the collection of metadata from datasets that use different vocabularies. The proposed framework uses this ontology to assess the currency of an RDF graph/statement, by extrapolating it from the currency of the documents that describe the resources occurring in the graphs (statement). The approach has been implemented and evaluated in two different scenarios.

[1]  Gerhard Weikum,et al.  Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia , 2010, EDBT '10.

[2]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[3]  Divesh Srivastava,et al.  Linking temporal records , 2011, Frontiers of Computer Science.

[4]  Liping Liu,et al.  Evolutional Data Quality: A Theory-Specific View , 2002, ICIQ.

[5]  Abraham Bernstein,et al.  Applied Temporal RDF: Efficient Temporal Querying of RDF Data with SPARQL , 2009, ESWC.

[6]  Nigel Shadbolt,et al.  Linked Timelines: Temporal Representation and Management in Linked Data , 2010, COLD.

[7]  Thomas Redman,et al.  Data quality for the information age , 1996 .

[8]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[9]  Jürgen Umbrich,et al.  Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources , 2010, LDOW.

[10]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[11]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[12]  Jens Lehmann,et al.  Update Strategies for DBpedia Live , 2010, SFSW.

[13]  Claudio Gutiérrez,et al.  Temporal RDF , 2005, ESWC.

[14]  Olaf Hartig Provenance Information in the Web of Data , 2009, LDOW.

[15]  R. P. Srivastava,et al.  A conceptual framework and belief‐function approach to assessing overall information quality , 2003, Int. J. Intell. Syst..

[16]  Jerry R. Hobbs,et al.  An ontology of time for the semantic web , 2004, TALIP.