Dimensional enrichment of statistical linked open data

On-Line Analytical Processing (OLAP) is a data analysis technique typically used for local and well-prepared data. However, initiatives like Open Data and Open Government bring new and publicly available data on the web that are to be analyzed in the same way. The use of semantic web technologies for this context is especially encouraged by the Linked Data initiative. There is already a considerable amount of statistical linked open data sets published using the RDF Data Cube Vocabulary (QB) which is designed for these purposes. However, QB lacks some essential schema constructs (e.g.,źdimension levels) to support OLAP. Thus, the QB4OLAP vocabulary has been proposed to extend QB with the necessary constructs and be fully compliant with OLAP. In this paper, we focus on the enrichment of an existing QB data set with QB4OLAP semantics. We first thoroughly compare the two vocabularies and outline the benefits of QB4OLAP. Then, we propose a series of steps to automate the enrichment of QB data sets with specific QB4OLAP semantics; being the most important, the definition of aggregate functions and the detection of new concepts in the dimension hierarchy construction. The proposed steps are defined to form a semi-automatic enrichment method, which is implemented in a tool that enables the enrichment in an interactive and iterative fashion. The user can enrich the QB data set with QB4OLAP concepts (e.g.,źfull-fledged dimension hierarchies) by choosing among the candidate concepts automatically discovered with the steps proposed. Finally, we conduct experiments with 25 users and use three real-world QB data sets to evaluate our approach. The evaluation demonstrates the feasibility of our approach and shows that, in practice, our tool facilitates, speeds up, and guarantees the correct results of the enrichment process.

[1]  Volker Markl,et al.  Situational Business Intelligence , 2008, BIRTE.

[2]  Muhammad Saleem,et al.  HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation , 2014, ESWC.

[3]  Renzo Angles,et al.  A Comparison of Current Graph Database Models , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[4]  Torben Bach Pedersen,et al.  SM4AM: A Semantic Metamodel for Analytical Metadata , 2014, DOLAP '14.

[5]  Cristina Dutra de Aguiar Ciferri,et al.  Cube Algebra: A Generic User-Centric Model and Query Language for OLAP Cubes , 2013, Int. J. Data Warehous. Min..

[6]  M. Jarke,et al.  Fundamentals of Data Warehouses , 2003, Springer Berlin Heidelberg.

[7]  Alberto O. Mendelzon,et al.  Maintaining data cubes under dimension updates , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[8]  Diego Calvanese,et al.  Discovering functional dependencies for multidimensional design , 2009, DOLAP.

[9]  Gottfried Vossen,et al.  Towards Self-Service Business Intelligence , 2013 .

[10]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[11]  Torben Bach Pedersen,et al.  QB2OLAP: Enabling OLAP on Statistical Linked Open Data , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[12]  Rafael Berlanga Llavori,et al.  Building data warehouses with semantic web data , 2012, Decis. Support Syst..

[13]  Alberto Abelló,et al.  A Survey of Multidimensional Modeling Methodologies , 2009, Int. J. Data Warehous. Min..

[14]  ISO / IEC 25010 : 2011 Systems and software engineering — Systems and software Quality Requirements and Evaluation ( SQuaRE ) — System and software quality models , 2013 .

[15]  Lorena Etcheverry,et al.  QB4OLAP: A Vocabulary for OLAP Cubes on the Semantic Web , 2012, COLD.

[17]  Christian S. Jensen,et al.  A foundation for capturing and querying complex multidimensional data , 2001, Inf. Syst..

[18]  Peter Boncz,et al.  Experiences with Virtuoso Cluster RDF Column Store , 2014, Linked Data Management.

[19]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[20]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[21]  Torben Bach Pedersen,et al.  Using Semantic Web Technologies for Exploratory OLAP: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[22]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[23]  Torben Bach Pedersen,et al.  Towards Exploratory OLAP Over Linked Open Data - A Case Study , 2014, BIRTE.

[24]  Leticia I. Gómez,et al.  A generic data model and query language for spatiotemporal OLAP cube analysis , 2012, EDBT '12.

[25]  Torben Bach Pedersen,et al.  Discovering Multidimensional Structure in Relational Data , 2004, DaWaK.

[26]  Lukasz Kowalik,et al.  Adjacency queries in dynamic sparse graphs , 2007, Inf. Process. Lett..

[27]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[28]  Jose-Norberto Mazón,et al.  A survey on summarizability issues in multidimensional modeling , 2009, Data Knowl. Eng..

[29]  Andreas Harth,et al.  No Size Fits All - Running the Star Schema Benchmark with SPARQL and RDF Aggregate Views , 2013, ESWC.

[30]  Alberto Abelló,et al.  A framework for multidimensional design of data warehouses from ontologies , 2010, Data Knowl. Eng..

[31]  Alejandro A. Vaisman Publishing OLAP Cubes on the Semantic Web , 2015, eBISS.

[32]  Diego Calvanese,et al.  The DL-Lite Family and Relations , 2009, J. Artif. Intell. Res..

[33]  Alon Y. Halevy,et al.  Why Your Data Won’t Mix , 2005, ACM Queue.

[34]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[35]  Lorena Etcheverry,et al.  QB4OLAP: A new vocabulary for olap cubes on the semantic web , 2012 .

[36]  Benedikt Kämpgen,et al.  Interacting with Statistical Linked Data via OLAP Operations , 2012, ILD@ESWC.

[37]  Panos Vassiliadis,et al.  Modeling multidimensional databases, cubes and cube operations , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[38]  Andreas Harth,et al.  Transforming statistical linked data for use in OLAP systems , 2011, I-Semantics '11.

[39]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.

[40]  Torben Bach Pedersen,et al.  Towards Next Generation BI Systems: The Analytical Metadata Challenge , 2014, DaWaK.

[41]  Jennifer Widom,et al.  Database systems - the complete book (2. ed.) , 2009 .

[42]  Rafael Berlanga Llavori,et al.  Building data warehouses with semantic data , 2010, EDBT '10.

[43]  Alberto Abelló,et al.  A requirement-driven approach to the design and evolution of data warehouses , 2014, Inf. Syst..