论文信息 - Strategies for Efficiently Keeping Local Linked Open Data Caches Up-To-Date

Strategies for Efficiently Keeping Local Linked Open Data Caches Up-To-Date

Quite often, Linked Open Data (LOD) applications pre-fetch data from the Web and store local copies of it in a cache for faster access at runtime. Yet, recent investigations have shown that data published and interlinked on the LOD cloud is subject to frequent changes. As the data in the cloud changes, local copies of the data need to be updated. However, due to limitations of the available computational resources (e.g., network bandwidth for fetching data, computation time) LOD applications may not be able to permanently visit all of the LOD sources at brief intervals in order to check for changes. These limitations imply the need to prioritize which data sources should be considered first for retrieving their data and synchronizing the local copy with the original data. In order to make best use of the resources available, it is vital to choose a good scheduling strategy to know when to fetch data of which data source. In this paper, we investigate different strategies proposed in the literature and evaluate them on a large-scale LOD dataset that is obtained from the LOD cloud by weekly crawls over the course of three years. We investigate two different setups: (i) in the single step setup, we evaluate the quality of update strategies for a single and isolated update of a local data cache, while (ii) the iterative progression setup involves measuring the quality of the local data cache when considering iterative updates over a longer period of time. Our evaluation indicates the effectiveness of each strategy for updating local copies of LOD sources, i. e, we demonstrate for given limitations of bandwidth, the strategies’ performance in terms of data accuracy and freshness. The evaluation shows that the measures capturing change behavior of LOD sources over time are most suitable for conducting updates.

Ansgar Scherp | Thomas Gottron | Renata Queiroz Dividino

[1] Guido Moerkotte,et al. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[2] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.

[3] Hector Garcia-Molina,et al. The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.

[4] Hector Garcia-Molina,et al. Synchronizing a database to improve freshness , 2000, SIGMOD '00.

[5] Thomas Gottron,et al. An Investigation of HTTP Header Information for Detecting Changes of Linked Open Data Sources , 2014, ESWC.

[6] George Cybenko,et al. How dynamic is the Web? , 2000, Comput. Networks.

[7] Gerd Gröner,et al. Change-a-LOD: Does the Schema on the Linked Data Cloud Change or Not? , 2013, COLD.

[8] Jürgen Umbrich,et al. An empirical survey of Linked Data conformance , 2012, J. Web Semant..

[9] Gerd Gröner,et al. From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources , 2014, PROFILES@ESWC.

[10] Thomas Gottron,et al. Perplexity of Index Models over Evolving Linked Data , 2014, ESWC.

[11] Olaf Hartig,et al. Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution , 2011, ESWC.