Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

In today’s data-driven world, analytical querying, typically based on the data cube concept, is the cornerstone of answering important business questions and making data-driven decisions. Traditionally, the underlying analytical data was mostly internal to the organization and stored in relational data warehouses and data cubes. Today, external data sources are essential for analytics and, as the Semantic Web gains popularity, more and more external sources are available in native RDF. With the recent SPARQL 1.1 standard, performing analytical queries over RDF data sources has finally become feasible. However, unlike their relational counterparts, RDF data cubes stores lack optimizations that enable fast querying. In this paper, we present an approach to optimizing RDF data cubes that is based on three novel cube patterns that optimize RDF data cubes, as well as associated algorithms that transform the RDF data cube. An extensive experimental evaluation shows that the approach allows trading additional storage and/or load times in return for significantly increased query performance. We further provide guidelines for which patterns to apply for specific scenarios

[1]  Torben Bach Pedersen,et al.  Multidimensional Databases and Data Warehousing , 2010, Multidimensional Databases and Data Warehousing.

[2]  Boualem Benatallah,et al.  A Framework and a Language for On-Line Analytical Processing on Graphs , 2012, WISE.

[3]  Philippe Cudré-Mauroux,et al.  dipLODocus[RDF] - Short and Long-Tail RDF Analytics for Massive Webs of Data , 2011, SEMWEB.

[4]  François Goasdoué,et al.  RDF analytics: lenses over semantic graphs , 2014, WWW.

[5]  Tapio Niemi,et al.  An ETL Process for OLAP Using RDF/OWL Ontologies , 2009, J. Data Semant..

[6]  Philip S. Yu,et al.  Graph OLAP: a multi-dimensional framework for graph data analysis , 2009, Knowledge and Information Systems.

[7]  Torben Bach Pedersen,et al.  Publishing Danish Agricultural Government Data as Semantic Web Data , 2014, JIST.

[8]  Torben Bach Pedersen,et al.  Processing Aggregate Queries in a Federation of SPARQL Endpoints , 2015, ESWC.

[9]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[10]  Gottfried Vossen,et al.  Towards Self-Service Business Intelligence , 2013 .

[11]  Alberto Abelló,et al.  Open Access Semantic Aware Business Intelligence , 2013, eBISS.

[12]  Jiawei Han,et al.  gIceberg: Towards iceberg analysis in large graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[13]  Andreas Harth,et al.  No Size Fits All - Running the Star Schema Benchmark with SPARQL and RDF Aggregate Views , 2013, ESWC.

[14]  Pedro Furtado,et al.  ONE: A Predictable and Scalable DW Model , 2011, DaWaK.

[15]  Lorena Etcheverry,et al.  QB4OLAP: A Vocabulary for OLAP Cubes on the Semantic Web , 2012, COLD.

[16]  Lorena Etcheverry,et al.  QB4OLAP: A new vocabulary for olap cubes on the semantic web , 2012 .