No Size Fits All - Running the Star Schema Benchmark with SPARQL and RDF Aggregate Views

Statistics published as Linked Data promise efficient extraction, transformation and loading (ETL) into a database for decision support. The predominant way to implement analytical query capabilities in industry are specialised engines that translate OLAP queries to SQL queries on a relational database using a star schema (ROLAP). A more direct approach than ROLAP is to load Statistical Linked Data into an RDF store and to answer OLAP queries using SPARQL. However, we assume that general-purpose triple stores – just as typical relational databases – are no perfect fit for analytical workloads and need to be complemented by OLAP-to-SPARQL engines. To give an empirical argument for the need of such an engine, we first compare the performance of our generated SPARQL and of ROLAP SQL queries. Second, we measure the performance gain of RDF aggregate views that, similar to aggregate tables in ROLAP, materialise parts of the data cube.

[1]  Lorena Etcheverry,et al.  QB4OLAP: A new vocabulary for olap cubes on the semantic web , 2012 .

[2]  Benedikt Kämpgen,et al.  Interacting with Statistical Linked Data via OLAP Operations , 2012, ILD@ESWC.

[3]  Oscar Corcho,et al.  Proceedings of the 9th international conference on The Semantic Web: research and applications , 2012 .

[4]  François Goasdoué,et al.  View Selection in Semantic Web Databases , 2011, Proc. VLDB Endow..

[5]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[6]  Jacopo Urbani,et al.  Robust Runtime Optimization and Skew-Resistant Execution of Analytical SPARQL Queries on Pig , 2012, International Semantic Web Conference.

[7]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[8]  Lorena Etcheverry,et al.  QB4OLAP: A Vocabulary for OLAP Cubes on the Semantic Web , 2012, COLD.

[9]  Alexander Zeier,et al.  A mixed transaction processing and operational reporting benchmark , 2011, Inf. Syst. Frontiers.

[10]  Orri Erling Directions and Challenges for Semdata , 2010 .

[11]  Octavian Udrea,et al.  Apples and oranges: a comparison of RDF benchmarks and real RDF datasets , 2011, SIGMOD '11.

[12]  Inderpal Singh Mumick,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications , 1999, IEEE Data Eng. Bull..

[13]  Lee Feigenbaum,et al.  SCOVO: Using Statistics on the Web of Data , 2009, ESWC.

[14]  Michael Stonebraker,et al.  One Size Fits All? - Part 2: Benchmarking Results , 2007 .

[15]  Orri Erling,et al.  Virtuoso, a Hybrid RDBMS/Graph Column Store , 2012, IEEE Data Eng. Bull..

[16]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[17]  Andreas Harth,et al.  Transforming statistical linked data for use in OLAP systems , 2011, I-Semantics '11.

[18]  Florian Daniel,et al.  Current Trends in Web Engineering , 2010, Lecture Notes in Computer Science.

[19]  Ulf Leser,et al.  Selecting Materialized Views for RDF Data , 2010, ICWE Workshops.

[20]  Surajit Chaudhuri,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications. , 1995 .

[21]  Lorena Etcheverry,et al.  Enhancing OLAP Analysis with Web Cubes , 2012, ESWC.

[22]  Konstantinos Morfonios,et al.  ROLAP implementations of the data cube , 2007, CSUR.

[23]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.