Using Semantic Web Technologies for Exploratory OLAP: A Survey

This paper describes the convergence of some of the most influential technologies in the last few years, namely data warehousing (DW), on-line analytical processing (OLAP), and the Semantic Web (SW). OLAP is used by enterprises to derive important business-critical knowledge from data inside the company. However, the most interesting OLAP queries can no longer be answered on internal data alone, external data must also be discovered (most often on the web), acquired, integrated, and (analytically) queried, resulting in a new type of OLAP, exploratory OLAP. When using external data, an important issue is knowing the precise semantics of the data. Here, SW technologies come to the rescue, as they allow semantics (ranging from very simple to very complex) to be specified for web-available resources. SW technologies do not only support capturing the “passive” semantics, but also support active inference and reasoning on the data. The paper first presents a characterization of DW/OLAP environments, followed by an introduction to the relevant SW foundation concepts. Then, it describes the relationship of multidimensional (MD) models and SW technologies, including the relationship between MD models and SW formalisms. Next, the paper goes on to survey the use of SW technologies for data modeling and data provisioning, including semantic data annotation and semantic-aware extract, transform, and load (ETL) processes. Finally, all the findings are discussed and a number of directions for future research are outlined, including SW support for intelligent MD querying, using SW technologies for providing context to data warehouses, and scalability issues.

[1]  Christophe Rigotti,et al.  A Rule-Based Data Manipulation Language for OLAP Systems , 1997, DOOD.

[2]  Umeshwar Dayal,et al.  Business Intelligence for the Real-Time Enterprise , 2009 .

[3]  Ladjel Bellatreche,et al.  MIRSOFT: mediator for integrating and reconciling sources using ontological functional dependencies , 2012, Int. J. Web Grid Serv..

[4]  Jyrki Nummenmaa,et al.  Ontologies with Semantic Web/Grid in Data Integration for OLAP , 2007, Int. J. Semantic Web Inf. Syst..

[5]  Marti A. Hearst Trends & Controversies: Information integration , 1998, IEEE Intell. Syst..

[6]  Torsten Priebe,et al.  Reinventing the Wheel?! Why Harmonization and Reuse Fail in Complex Data Warehouse Environments and a Proposed Solution to the Problem , 2011, Wirtschaftsinformatik.

[7]  Panos Vassiliadis,et al.  Towards Quality-oriented Data Warehouse Usage and Evolution , 2000, Inf. Syst..

[8]  Alon Y. Levy The Information Manifold Approach to Data Integration , 2007 .

[9]  Mickaël Baron,et al.  Ontology-based structured web data warehouses for sustainable interoperability: requirement modeling, design methodology and tool , 2012, Comput. Ind..

[10]  Timos K. Sellis,et al.  Ontology-Driven Conceptual Design of ETL Processes Using Graph Transformations , 2009, J. Data Semant..

[11]  Kevin Wilkinson,et al.  Optimizing analytic data flows for multiple execution engines , 2012, SIGMOD Conference.

[12]  Michael Lesk How Can We Get High-Quality Electronic Journals? , 1998 .

[13]  Moez Essaidi,et al.  ODBIS: towards a platform for on-demand business intelligence services , 2010, EDBT '10.

[14]  Torben Bach Pedersen,et al.  Integrating Data Warehouses with Web Data: A Survey , 2008, IEEE Transactions on Knowledge and Data Engineering.

[15]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[16]  Diego Calvanese,et al.  Description Logics for Conceptual Data Modeling , 1998, Logics for Databases and Information Systems.

[17]  Ulrike Sattler Description Logics for the Representation of Aggregated Objects , 2000, ECAI.

[18]  Gottfried Vossen,et al.  Schema versioning in data warehouses: Enabling cross-version querying via schema augmentation , 2006, Data Knowl. Eng..

[19]  Jean Stephane,et al.  Ontology-based structured web data warehouses for sustainable interoperability: requirement modeling, design methodology and tool , 2012 .

[20]  Frank van Harmelen,et al.  WebPIE: A Web-scale Parallel Inference Engine using MapReduce , 2012, J. Web Semant..

[21]  Jose-Norberto Mazón,et al.  A survey on summarizability issues in multidimensional modeling , 2009, Data Knowl. Eng..

[22]  Umeshwar Dayal,et al.  Live Business Intelligence for the Real-Time Enterprise , 2010, From Active Data Management to Event-Based Systems and More.

[23]  Tapio Niemi,et al.  An ETL Process for OLAP Using RDF/OWL Ontologies , 2009, J. Data Semant..

[24]  Letizia Tanca,et al.  What you Always Wanted to Know About Datalog (And Never Dared to Ask) , 1989, IEEE Trans. Knowl. Data Eng..

[25]  Rafael Berlanga Llavori,et al.  Analysis of Ontological Instances - A Data Warehouse for the Semantic Web , 2007, ICSOFT.

[26]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[27]  Diego Calvanese,et al.  Linking Data to Ontologies , 2008, J. Data Semant..

[28]  Boris Motik,et al.  Representing and querying validity time in RDF and OWL: A logic-based approach , 2010, J. Web Semant..

[29]  Matteo Golfarelli,et al.  A Survey on Temporal Data Warehousing , 2009, Int. J. Data Warehous. Min..

[30]  Dimitrios Skoutas,et al.  Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data , 2007, Int. J. Semantic Web Inf. Syst..

[31]  Ilia Petrov,et al.  From Active Data Management to Event-Based Systems and More , 2010, Lecture Notes in Computer Science.

[32]  Alberto Abelló,et al.  Under Consideration for Publication in Knowledge and Information Systems Ontology Driven Search of Compound Ids , 2022 .

[33]  Ian Horrocks,et al.  Position paper: a comparison of two modelling paradigms in the Semantic Web , 2006, WWW '06.

[34]  Torben Bach Pedersen,et al.  The Meta-Morphing Model Used in TARGIT BI Suite , 2011, ER Workshops.

[35]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS '05.

[36]  Gottfried Vossen,et al.  Towards Self-Service Business Intelligence , 2013 .

[37]  Panos Vassiliadis,et al.  Near Real Time ETL , 2009, New Trends in Data Warehousing and Data Analysis.

[38]  Bernd Neumayr,et al.  Using Domain Ontologies as Semantic Dimensions in Data Warehouses , 2012, ER.

[39]  Torben Bach Pedersen,et al.  Multidimensional Integrated Ontologies: A Framework for Designing Semantic Data Warehouses , 2009, J. Data Semant..

[40]  Ian Horrocks,et al.  Supporting concurrent ontology development: Framework, algorithms and tool , 2011, Data Knowl. Eng..

[41]  Alberto Abelló,et al.  A framework for multidimensional design of data warehouses from ontologies , 2010, Data Knowl. Eng..

[42]  Jacky Akoka,et al.  Multidimensional models meet the semantic web: defining and reasoning on OWL-DL ontologies for OLAP , 2012, DOLAP '12.

[43]  Alberto Abelló,et al.  Research in data warehouse modeling and design: dead or alive? , 2006, DOLAP '06.

[44]  Rafael Berlanga Llavori,et al.  Building data warehouses with semantic web data , 2012, Decis. Support Syst..

[45]  Dimitrios Skoutas,et al.  Representation of conceptual ETL designs in natural language using Semantic Web technology , 2010, Data Knowl. Eng..

[46]  Panos Vassiliadis,et al.  Towards Quality-oriented Data Warehouse Usage and Evolution , 2000, Inf. Syst..

[47]  Volker Markl,et al.  Situational Business Intelligence , 2008, BIRTE.

[48]  Diego Calvanese,et al.  Discovering functional dependencies for multidimensional design , 2009, DOLAP.

[49]  Monica S. Lam,et al.  SociaLite: Datalog extensions for efficient social network analysis , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[50]  Alberto Abelló,et al.  GEM: Requirement-Driven Generation of ETL and Multidimensional Conceptual Designs , 2011, DaWaK.

[51]  Irene Garrigós,et al.  Open business intelligence: on the importance of data quality awareness in user-friendly data mining , 2012, EDBT-ICDT '12.

[52]  Torben Bach Pedersen,et al.  Integrating XML data in the TARGIT OLAP system , 2004, Proceedings. 20th International Conference on Data Engineering.

[53]  Kevin Wilkinson,et al.  Data integration flows for business intelligence , 2009, EDBT '09.

[54]  Rafael Berlanga Llavori,et al.  Building data warehouses with semantic data , 2010, EDBT '10.

[55]  Andreas Harth,et al.  Transforming statistical linked data for use in OLAP systems , 2011, I-Semantics '11.

[56]  Bernd Neumayr,et al.  Towards ontology-based OLAP: datalog-based reasoning over multidimensional ontologies , 2012, DOLAP '12.

[57]  Diego Calvanese,et al.  Aggregate queries over ontologies , 2008, ONISW '08.

[58]  Diego Calvanese,et al.  The DL-Lite Family and Relations , 2009, J. Artif. Intell. Res..

[59]  Torben Bach Pedersen,et al.  Semantic Web Technologies for Business Intelligence , 2011 .

[60]  Benedikt Kämpgen,et al.  Interacting with Statistical Linked Data via OLAP Operations , 2012, ILD@ESWC.

[61]  Felix Wortmann,et al.  An architecture for ad-hoc and collaborative business intelligence , 2010, EDBT '10.