The contribution of linked open data to augment a traditional data warehouse

The arrival of Big Data has contributed positively to the evolution of the data warehouse ( DW ) technology. This gives birth of augmented DW s that aim at maximizing the effectiveness of existing ones. Various augmentation scenarios have been proposed and adopted by firms and industry covering several aspects such as new data sources (e.g., Linked Open Data ( LOD ), social, stream and IoT data), data ingestion, advanced deployment infrastructures, programming paradigms, data visualization. These scenarios allow companies reaching valuable decisions. By examining traditional DW s, we realized that they do not fulfill all decision-maker requirements since data sources alimenting a target DW are not rich enough to capture Big Data. The arrival of LOD era is an excellent opportunity to enrich traditional DW s with a new V dimension: Value . In this paper, we first conceptualize the variety of internal and external sources and study its effect on the ETL phase to ease the value capturing. Secondly, a Value-driven approach for the DW design is discussed. Thirdly, three realistic scenarios for integrating LOD in the DW landscape are given. Finally, experiments are conducted showing the added value by augmenting the existing DW environment with LOD .

[1]  Boualem Benatallah,et al.  A Value-Added Approach to Design BI Applications , 2016, DaWaK.

[2]  Rafael Berlanga Llavori,et al.  Building data warehouses with semantic data , 2010, EDBT '10.

[3]  Olivier Teste,et al.  Graph-based ETL Processes for Warehousing Statistical Open Data , 2015, ICEIS.

[4]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[5]  Ladjel Bellatreche,et al.  OntoDB: An Ontology-Based Database for Data Intensive Applications , 2007, DASFAA.

[6]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[7]  Carlos Ordonez,et al.  Value-driven Approach for Designing Extended Data Warehouses , 2019, DOLAP.

[8]  Anjana Gosain,et al.  Literature Review of Data Model Quality Metrics of Data Warehouse , 2015 .

[9]  Bernd Neumayr,et al.  The VADA Architecture for Cost-Effective Data Wrangling , 2017, SIGMOD Conference.

[10]  Ladjel Bellatreche,et al.  A Variety-Sensitive ETL Processes , 2017, DEXA.

[11]  Matteo Golfarelli,et al.  A Survey on Temporal Data Warehousing , 2009, Int. J. Data Warehous. Min..

[12]  Dimitrios Skoutas,et al.  Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data , 2007, Int. J. Semantic Web Inf. Syst..

[13]  Sushama Nagpal,et al.  Empirical analysis of metrics for object oriented multidimensional model of data warehouse using unsupervised machine learning techniques , 2017, Int. J. Syst. Assur. Eng. Manag..

[14]  Lorena Etcheverry,et al.  Modeling and Querying Data Warehouses on the Semantic Web Using QB4OLAP , 2014, DaWaK.

[15]  Torben Bach Pedersen,et al.  Towards a Programmable Semantic Extract-Transform-Load Framework for Semantic Data Warehouses , 2015, DOLAP.

[16]  Nicola Guarino,et al.  Towards an Ontology of Value Ascription , 2016, FOIS.

[17]  Alistair G. Sutcliffe,et al.  Value-based requirements engineering: method and experience , 2017, Requirements Engineering.

[18]  Boualem Benatallah,et al.  DataSynapse: A Social Data Curation Foundry , 2018, Distributed and Parallel Databases.

[19]  Heikki Topi,et al.  Modern Database Management , 1999 .

[20]  Yannis Tzitzikas,et al.  Scalable Methods for Measuring the Connectivity and Quality of Large Numbers of Linked Datasets , 2018, ACM J. Data Inf. Qual..

[21]  Jaap Gordijn,et al.  Value-based requirements engineering: exploring innovative e-commerce ideas , 2003, Requirements Engineering.

[22]  Olivier Teste,et al.  Designing multidimensional cubes from warehoused data and linked open data , 2016, 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS).

[23]  Matteo Golfarelli,et al.  QETL: An approach to on-demand ETL from non-owned data sources , 2017, Data Knowl. Eng..

[24]  Olivier Teste,et al.  OLAP Manipulations on RDF Data following a Constellation Model , 2013, SemStats@ISWC.

[25]  Nicola Guarino,et al.  An Ontological Analysis of Value Propositions , 2017, 2017 IEEE 21st International Enterprise Distributed Object Computing Conference (EDOC).

[26]  Alain Wegmann,et al.  On the Systemic Enterprise Architecture Methodology (Seam) , 2003, ICEIS.

[27]  Barry W. Boehm Value-based software engineering: reinventing , 2003, SOEN.

[28]  Boualem Benatallah,et al.  CoreKG: a Knowledge Lake Service , 2018, Proc. VLDB Endow..

[29]  Mario Piattini,et al.  Metrics for data warehouse conceptual models understandability , 2007, Inf. Softw. Technol..

[30]  Vlado Dimovski,et al.  Business intelligence and analytics for value creation: The role of absorptive capacity , 2019, Int. J. Inf. Manag..

[31]  Benedikt Kämpgen,et al.  Interacting with Statistical Linked Data via OLAP Operations , 2012, ILD@ESWC.

[32]  Alberto Abelló,et al.  Towards Exploratory OLAP on Linked Data , 2016, SEBD.

[33]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[34]  Diego Calvanese,et al.  A Principled Approach to Data Integration and Reconciliation in Data Warehousing , 1999, DMDW.

[35]  Kuo-Ming Chao,et al.  OLAP for Multidimensional Semantic Web Databases , 2014, BIRTE.

[36]  Fernanda Araujo Baião,et al.  The Common Ontology of Value and Risk , 2018, ER.

[37]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[38]  Giri Kumar Tayi,et al.  Enhancing data quality in data warehouse environments , 1999, CACM.

[39]  BerlangaRafael,et al.  Building data warehouses with semantic web data , 2012, DSS 2012.

[40]  Torben Bach Pedersen,et al.  Using Semantic Web Technologies for Exploratory OLAP: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.