Towards Exploratory OLAP Over Linked Open Data - A Case Study

Business Intelligence (BI) tools provide fundamental support for analyzing large volumes of information. Data Warehouses (DW) and Online Analytical Processing (OLAP) tools are used to store and analyze data. Nowadays more and more information is available on the Web in the form of Resource Description Framework (RDF), and BI tools have a huge potential of achieving better results by integrating real-time data from web sources into the analysis process. In this paper, we describe a framework for so-called exploratory OLAP over RDF sources. We propose a system that uses a multidimensional schema of the OLAP cube expressed in RDF vocabularies. Based on this information the system is able to query data sources, extract and aggregate data, and build a cube. We also propose a computer-aided process for discovering previously unknown data sources and building a multidimensional schema of the cube. We present a use case to demonstrate the applicability of the approach.

[1]  Olaf Hartig,et al.  Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution , 2011, ESWC.

[2]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[3]  Guido Moerkotte,et al.  Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[4]  Torben Bach Pedersen,et al.  XML-extended OLAP querying , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[5]  Rafael Berlanga Llavori,et al.  Building data warehouses with semantic data , 2010, EDBT '10.

[6]  Jürgen Umbrich,et al.  Comparing data summaries for processing live queries over Linked Data , 2011, World Wide Web.

[7]  Rafael Berlanga Llavori,et al.  Building data warehouses with semantic web data , 2012, Decis. Support Syst..

[8]  Alberto Abelló,et al.  Automating multidimensional design from ontologies , 2007, DOLAP '07.

[9]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[10]  Jacek Kopecky,et al.  iServe: a linked services publishing platform , 2010 .

[11]  Andreas Harth,et al.  No Size Fits All - Running the Star Schema Benchmark with SPARQL and RDF Aggregate Views , 2013, ESWC.

[12]  Katja Hose,et al.  FedX: A Federation Layer for Distributed Query Processing on Linked Open Data , 2011, ESWC.

[13]  Torben Bach Pedersen,et al.  Using Semantic Web Technologies for Exploratory OLAP: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[14]  Esteban Zimányi,et al.  Data Warehouse Systems , 2014, Data-Centric Systems and Applications.

[15]  Gottfried Vossen,et al.  Towards Self-Service Business Intelligence , 2013 .

[16]  John G. Breslin,et al.  An Architecture to Discover and Query Decentralized RDF Data , 2007, SFSW.

[17]  Jürgen Umbrich,et al.  Data summaries for on-demand queries over linked data , 2010, WWW '10.

[18]  Fabian Prasser,et al.  Efficient distributed query processing for autonomous RDF databases , 2012, EDBT '12.

[19]  Jürgen Umbrich,et al.  Resource Planning for SPARQL Query Execution on Data Sharing Platforms , 2014, COLD.

[20]  Lorena Etcheverry,et al.  QB4OLAP: A Vocabulary for OLAP Cubes on the Semantic Web , 2012, COLD.

[21]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[22]  Lorena Etcheverry,et al.  QB4OLAP: A new vocabulary for olap cubes on the semantic web , 2012 .

[23]  Benedikt Kämpgen,et al.  Interacting with Statistical Linked Data via OLAP Operations , 2012, ILD@ESWC.

[24]  Jens Lehmann,et al.  RelFinder: Revealing Relationships in RDF Knowledge Bases , 2009, SAMT.

[25]  John Domingue,et al.  Toward the Next Wave of Services: Linked Services for the Web of Data , 2010, J. Univers. Comput. Sci..

[26]  Jürgen Umbrich,et al.  Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine , 2011, J. Web Semant..

[27]  Michael Hausenblas,et al.  Describing linked datasets with the VoID vocabulary , 2011 .

[28]  Jürgen Umbrich,et al.  Hybrid SPARQL Queries: Fresh vs. Fast Results , 2012, SEMWEB.

[29]  Hiroyuki Kitagawa,et al.  An ETL Framework for Online Analytical Processing of Linked Open Data , 2013, WAIM.

[30]  Günter Ladwig,et al.  Linked Data Query Processing Strategies , 2010, SEMWEB.

[31]  Lorena Etcheverry,et al.  Modeling and Querying Data Warehouses on the Semantic Web Using QB4OLAP , 2014, DaWaK.

[32]  Andreas Harth,et al.  Transforming statistical linked data for use in OLAP systems , 2011, I-Semantics '11.

[33]  Katja Hose,et al.  Towards benefit-based RDF source selection for SPARQL queries , 2012, SWIM '12.