Estimating the Dynamics of SPARQL Query Results Using Binary Classification

We address the problem of estimating when the results of an input SPARQL query over dynamic RDF datasets will change. We evaluate a framework that extracts features from the query and/or from past versions of the target dataset and inputs them into binary classifiers to predict whether or not the results for a query will change at a fixed point in the near future. For this evaluation, we create a gold standard based on 23 versions of Wikidata and a curated collection of 221 SPARQL queries. Our results show that the quality of predictions possible using (only) features based on the query structure and lightweight statistics of the predicate dynamics – though capable of beating a random baseline – are not competitive with results obtained using (more costly to derive) knowledge of the complete historical changes in the query results.

[1]  Guido Moerkotte,et al.  Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[2]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[3]  Jürgen Umbrich,et al.  Hybrid SPARQL Queries: Fresh vs. Fast Results , 2012, SEMWEB.

[4]  Jürgen Umbrich,et al.  Linked Data and Live Querying for Enabling Support Platforms for Web Dataspaces , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[5]  Jürgen Umbrich,et al.  Towards Understanding the Changing Web: Mining the Dynamics of Linked-Data Sources and Entities , 2010, LWA.

[6]  Jürgen Umbrich,et al.  Freshening up while Staying Fast: Towards Hybrid SPARQL Queries , 2012, EKAW.

[7]  Harald Sack,et al.  Scheduling Refresh Queries for Keeping Results from a SPARQL Endpoint Up-to-Date (Short Paper) , 2016, OTM Conferences.

[8]  Philipp Frischmuth,et al.  Weaving a Social Data Web with Semantic Pingback , 2010, EKAW.

[9]  Sungyoung Lee,et al.  Evaluating scheduling strategies in LOD based application , 2017, 2017 19th Asia-Pacific Network Operations and Management Symposium (APNOMS).

[10]  Kjetil Kjernsmo A Survey of HTTP Caching Implementations on the Open Semantic Web , 2015, ESWC.

[11]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[12]  Marcelo Arenas,et al.  Querying semantic web data with SPARQL , 2011, PODS.

[13]  Carole A. Goble,et al.  Requirements and Services for Metadata Management , 2007, IEEE Internet Computing.

[14]  Jürgen Umbrich,et al.  Measures for Assessing the Data Freshness in Open Data Portals , 2016, 2016 2nd International Conference on Open and Big Data (OBD).

[15]  Ansgar Scherp,et al.  Temporal Patterns and Periodicity of Entity Dynamics in the Linked Open Data Cloud , 2015, K-CAP.

[16]  Markus Krötzsch,et al.  Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia's Knowledge Graph , 2018, SEMWEB.

[17]  Jürgen Umbrich,et al.  Optimizing SPARQL Query Processing on Dynamic and Static Data Based on Query Time/Freshness Requirements Using Materialization , 2014, JIST.

[18]  Jürgen Umbrich,et al.  Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources , 2010, LDOW.

[19]  Gerd Gröner,et al.  From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources , 2014, PROFILES@ESWC.

[20]  Thomas Gottron,et al.  An Investigation of HTTP Header Information for Detecting Changes of Linked Open Data Sources , 2014, ESWC.

[21]  Özgür Ulusoy,et al.  Adaptive Time-to-Live Strategies for Query Result Caching in Web Search Engines , 2012, ECIR.

[22]  Gerd Gröner,et al.  Change-a-LOD: Does the Schema on the Linked Data Cloud Change or Not? , 2013, COLD.

[23]  Marcelo Arenas,et al.  Designing a Query Language for RDF: Marrying Open and Closed Worlds , 2016, PODS.

[24]  Jürgen Umbrich,et al.  Observing Linked Data Dynamics , 2013, ESWC.

[25]  Sören Auer,et al.  Linked Open Data -- Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project , 2014 .

[26]  Ansgar Scherp,et al.  Strategies for Efficiently Keeping Local Linked Open Data Caches Up-To-Date , 2015, International Semantic Web Conference.

[27]  Aidan Hogan,et al.  Modelling Dynamics in Semantic Web Knowledge Graphs with Formal Concept Analysis , 2018, WWW.

[28]  Alexandre Passant,et al.  sparqlPuSH: Proactive Notification of Data Updates in RDF Stores Using PubSubHubbub , 2010, SFSW.

[29]  Jürgen Umbrich,et al.  Towards capturing and preserving changes on the Web of Data , 2015, DIACRON@ESWC.

[30]  Michael Martin,et al.  Facilitating the Exploration and Visualization of Linked Data , 2014, Linked Open Data.

[31]  Olivier Corby,et al.  Col-Graph: Towards Writable and Scalable Linked Open Data , 2014, SEMWEB.

[32]  Ansgar Scherp,et al.  Keeping linked open data caches up-to-date by predicting the life-time of RDF triples , 2017, WI.

[33]  Ansgar Scherp,et al.  Information-theoretic Analysis of Entity Dynamics on the Linked Open Data Cloud , 2016, PROFILES@ESWC.