Towards Answering Provenance-Enabled SPARQL Queries Over RDF Data Cubes

The SPARQL 1.1 standard has made it possible to formulate analytical queries in SPARQL. While some approaches have become available for processing analytical queries on RDF data cubes, little attention has been paid to answering provenance-enabled queries over such data. Yet, considering provenance is a prerequisite to being able to validate if a query result is trustworthy. The main challenge for existing triple stores is the way provenance can be encoded in standard triple stores based on context values (named graphs). Hence, in this paper we analyze the suitability of existing triple stores for answering provenance-enabled queries on RDF data cubes, identify their shortcomings, and propose an index to handle the high number of context values that provenance encoding typically entails. Our experimental results using the Star Schema Benchmark show the feasibility and scalability of our index and query evaluation strategies.

[1]  Paul T. Groth,et al.  TripleProv: efficient processing of lineage queries in a native RDF store , 2014, WWW.

[2]  Torben Bach Pedersen,et al.  Multidimensional Databases and Data Warehousing , 2010, Multidimensional Databases and Data Warehousing.

[3]  Olaf Hartig,et al.  Foundations of an Alternative Approach to Reification in RDF , 2014, ArXiv.

[4]  Alexander Zeier,et al.  A mixed transaction processing and operational reporting benchmark , 2011, Inf. Syst. Frontiers.

[5]  Vassilis Christophides,et al.  Coloring RDF Triples to Capture Provenance , 2009, SEMWEB.

[6]  Olivier Teste,et al.  Combining Business Intelligence with Semantic Web : Overview and Challenges , 2015, INFORSID.

[7]  Torben Bach Pedersen,et al.  Using Semantic Web Technologies for Exploratory OLAP: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[8]  Torben Bach Pedersen,et al.  Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries , 2015, COLD.

[9]  Guilin Qi,et al.  On Publishing Chinese Linked Open Schema , 2014, International Semantic Web Conference.

[10]  Alberto Abelló,et al.  ORE: an iterative approach to the design and evolution of multi-dimensional schemas , 2012, DOLAP '12.

[11]  Torben Bach Pedersen,et al.  Modeling and Querying Spatial Data Warehouses on the Semantic Web , 2015, JIST.

[12]  Torben Bach Pedersen,et al.  Processing Aggregate Queries in a Federation of SPARQL Endpoints , 2015, ESWC.

[13]  Shiyong Lu,et al.  RDFProv: A relational RDF store for querying and managing scientific workflow provenance , 2010, Data Knowl. Eng..

[14]  Torben Bach Pedersen,et al.  Towards Exploratory OLAP Over Linked Open Data - A Case Study , 2014, BIRTE.

[15]  Paul T. Groth,et al.  Executing Provenance-Enabled Queries over Web Data , 2015, WWW.

[16]  Torben Bach Pedersen,et al.  Towards a Programmable Semantic Extract-Transform-Load Framework for Semantic Data Warehouses , 2015, DOLAP.

[17]  John Abraham,et al.  Storing, Indexing and Querying Large Provenance Data Sets as RDF Graphs in Apache HBase , 2013, 2013 IEEE Ninth World Congress on Services.

[18]  Lorena Etcheverry,et al.  Modeling and Querying Data Warehouses on the Semantic Web Using QB4OLAP , 2014, DaWaK.