Tracing data warehouse design lifecycle semantically

Data warehouses (DW) are core parts of decision systems. Ontologies have largely contributed in designing DW systems, due to their ability to capture the precise semantics of design artifacts. Designing semantic DWs involves several steps, where each step is permanently evolving to satisfy new requirements offered by the technology progress. Passing from one phase to another requires important processes and decisions made by design actors. These decisions are usually lost once the DW is built. Tracing these decisions is a challenging issue. However, managing traceability in DW systems did not have the same spring as for software development. In this paper, we claim that the presence of ontologies can be an asset for managing DW traceability: ontologies can semantically define design artifacts during the whole design cycle, and their reasoning capabilities can be used to identify new trace links. In this paper, we propose an approach for semantic DW traceability that requires: (i) the formalization of each design phase, (ii) the identification of horizontal and vertical interactions (inside and between phases), (iii) their semantic definition, storage and usage. The approach is composed of three main activities for managing the traces. It is illustrated using LUBM benchmark, Protege Editor and Oracle semantic DBMS. It is implemented in a case tool assisting the designer for managing the DW traceability. HighlightsSemantic data warehouse (SDW) design is based on the availability of a domain ontology.Ontology (semantics and reasoning) is an important asset for defining a traceability strategy.The contributions include a traceability model and approach for SDW design.A case study proposed using Oracle semantic DBMS.Implementing prototype supporting the design process and the traceability approach.

[1]  Zhe Wu,et al.  Implementing an Inference Engine for RDFS/OWL Constructs and User-Defined Rules in Oracle , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Matthias Jarke,et al.  Toward Reference Models of Requirements Traceability , 2001, IEEE Trans. Software Eng..

[3]  Umeshwar Dayal,et al.  Benchmarking ETL Workflows , 2009, TPCTC.

[4]  Ladjel Bellatreche Optimization and Tuning in Data Warehouses , 2009, Encyclopedia of Database Systems.

[5]  Yamine Aït Ameur,et al.  Querying Ontology Based Databases - The OntoQL Proposal , 2006, SEKE.

[6]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[7]  Matteo Golfarelli Data Warehouse Life-Cycle and Design , 2009 .

[8]  Jean Stephane,et al.  Ontology-based structured web data warehouses for sustainable interoperability: requirement modeling, design methodology and tool , 2012 .

[9]  Bernd Neumayr,et al.  Using Domain Ontologies as Semantic Dimensions in Data Warehouses , 2012, ER.

[10]  Stanley B. Zdonik,et al.  CORADD , 2010, Proc. VLDB Endow..

[11]  Ivan Kurtev,et al.  Semantics of trace relations in requirements models for consistency checking and inferencing , 2011, Software & Systems Modeling.

[12]  Chi-Lun Liu CDNFRE: Conflict detector in non-functional requirement evolution based on ontologies , 2016, Comput. Stand. Interfaces.

[13]  Paul Brown,et al.  BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data , 2003, VLDB.

[14]  Jose-Norberto Mazón,et al.  Enriching Data Warehouse Dimension Hierarchies by Using Semantic Relations , 2006, BNCOD.

[15]  Dimitrios Skoutas,et al.  Designing ETL processes using semantic web technologies , 2006, DOLAP '06.

[16]  Matthias Jarke,et al.  Dwq : Esprit Long Term Research Project, No 22469 Data Warehouse Quality: a Review of the Dwq Project , 2022 .

[17]  Esteban Zimányi,et al.  Data Warehouse Systems , 2014, Data-Centric Systems and Applications.

[18]  Timos K. Sellis,et al.  Ontology-Driven Conceptual Design of ETL Processes Using Graph Transformations , 2009, J. Data Semant..

[19]  Ilyès Boukhari,et al.  More Investment in Conceptual Designers: Think about it! , 2012, 2012 IEEE 15th International Conference on Computational Science and Engineering.

[20]  Diego Calvanese,et al.  Description Logics for Conceptual Data Modeling , 1998, Logics for Databases and Information Systems.

[21]  Alberto Abelló,et al.  GEM: Requirement-Driven Generation of ETL and Multidimensional Conceptual Designs , 2011, DaWaK.

[22]  Anne Tchounikine,et al.  A model for distributing and querying a data warehouse on a computing grid , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[23]  Vijayan Sugumaran,et al.  The role of domain ontologies in database design: An ontology management and conceptual modeling environment , 2006, TODS.

[24]  Moshé M. Zloof Query-by-Example: A Data Base Language , 1977, IBM Syst. J..

[25]  Tapio Niemi,et al.  An ETL Process for OLAP Using RDF/OWL Ontologies , 2009, J. Data Semant..

[26]  Esteban Zimányi,et al.  BPMN-Based Conceptual Modeling of ETL Processes , 2012, DaWaK.

[27]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[28]  Ladjel Bellatreche,et al.  A methodology and tool for conceptual designing a data warehouse from ontology-based sources , 2010, DOLAP '10.

[29]  Ladjel Bellatreche,et al.  Traceability of Tightly Coupled Phases of Semantic Data Warehouse Design , 2015, OTM Conferences.

[30]  Dimitrios Skoutas,et al.  Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data , 2007, Int. J. Semantic Web Inf. Syst..

[31]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[32]  Giuliano Antoniol,et al.  Information retrieval models for recovering traceability links between code and documentation , 2000, Proceedings 2000 International Conference on Software Maintenance.

[33]  Jens von Pilgrim,et al.  A survey of traceability in requirements engineering and model-driven development , 2010, Software & Systems Modeling.

[34]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[35]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[36]  Richard F. Paige,et al.  A state-based approach to traceability maintenance , 2010, ECMFA-TW '10.

[37]  Ladjel Bellatreche,et al.  Towards a conceptualization of ETL and physical storage of semantic data warehouses as a service , 2013, Cluster Computing.

[38]  Habiba Drias,et al.  PLIB Ontology: A Mature Solution for Products Characterization in B2B Electronic Commerce , 2005, Int. J. IT Stand. Stand. Res..

[39]  Diego Calvanese,et al.  Data Integration in Data Warehousing (Keynote Address) , 2001, CAiSE Workshops.

[40]  Gilbert Regan,et al.  Medical device standards' requirements for traceability during the software development lifecycle and implementation of a traceability assessment model , 2013, Comput. Stand. Interfaces.

[41]  Wolfgang Lehner,et al.  A Framework for User-Centered Declarative ETL , 2014, DOLAP '14.

[42]  Volker Haarslev,et al.  Ontological approach for the semantic recovery of traceability links between software artefacts , 2008, IET Softw..

[43]  Mario Piattini,et al.  Applying MDA to the development of data warehouses , 2005, DOLAP '05.

[44]  W. H. Inmon,et al.  Building the Data Warehouse,3rd Edition , 2002 .

[45]  Erhard Rahm,et al.  Multi-Dimensional Database Allocation for Parallel Data Warehouses , 2000, VLDB.

[46]  George Spanoudakis,et al.  Software Traceability : A Roadmap , 2005 .

[47]  Matteo Golfarelli,et al.  Modern Software Engineering Methodologies Meet Data Warehouse Design: 4WD , 2011, DaWaK.

[48]  Adriana Marotta,et al.  Data warehouse design: a schema-transformation approach , 2002, 12th International Conference of the Chilean Computer Science Society, 2002. Proceedings..

[49]  Juan Trujillo,et al.  A trace metamodel proposal based on the model driven architecture framework for the traceability of user requirements in data warehouses , 2011, Inf. Syst..

[50]  Ladjel Bellatreche,et al.  CiDHouse: Contextual SemantIc Data WareHouses , 2013, DEXA.

[51]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[52]  Ian Horrocks,et al.  Modular Reuse of Ontologies: Theory and Practice , 2008, J. Artif. Intell. Res..

[53]  Divyakant Agrawal,et al.  Big data and cloud computing , 2010, Proc. VLDB Endow..

[54]  Juan Trujillo,et al.  Tracing conceptual models' evolution in data warehouses by using the model driven architecture , 2014, Comput. Stand. Interfaces.

[55]  Matteo Golfarelli From User Requirements to Conceptual Design in Warehouse Design: A Survey , 2010 .

[56]  Li Ma,et al.  SOR: A Practical System for Ontology Storage, Reasoning and Search , 2007, VLDB.

[57]  Sourav S. Bhowmick,et al.  MustBlend: Blending Visual Multi-Source Twig Query Formulation and Query Processing in RDBMS , 2013, International Conference on Database Systems for Advanced Applications.

[58]  Thanwadee Sunetnanta,et al.  Ontology-based multiperspective requirements traceability framework , 2010, Knowledge and Information Systems.

[59]  Matteo Golfarelli,et al.  Data Warehouse Testing , 2011, Int. J. Data Warehous. Min..

[60]  Ladjel Bellatreche,et al.  Towards a Configurable Database Design: A Case of Semantic Data Warehouses , 2014, OTM Conferences.

[61]  Arnaud Giacometti,et al.  A personalization framework for OLAP queries , 2005, DOLAP '05.

[62]  Ilyès Boukhari,et al.  Efficient, Unified, and Intelligent User Requirement Collection and Analysis in Global Enterprises , 2013, IIWAS '13.

[63]  Patrick Mäder,et al.  Towards automated traceability maintenance , 2012, J. Syst. Softw..

[64]  Boris Motik,et al.  A mapping system for the integration of OWL-DL ontologies , 2005, IHIS '05.

[65]  Torben Bach Pedersen,et al.  Using Semantic Web Technologies for Exploratory OLAP: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[66]  Jérôme Darmont,et al.  Business intelligence for small and middle-sized entreprises , 2010, SGMD.

[67]  Ralph Kimball,et al.  The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses , 1996 .

[68]  Julia Rubin,et al.  Model traceability , 2006, IBM Syst. J..

[69]  Jennifer Widom,et al.  Lineage tracing for general data warehouse transformations , 2003, The VLDB Journal.

[70]  Ladjel Bellatreche,et al.  Towards Performance Evaluation of Semantic Databases Management Systems , 2013, BNCOD.

[71]  Chimène Fankam OntoDB2 : un système flexible et efficient de base de données à base ontologique pour le web sémantique et les données techniques. (OntoDB2) , 2009 .

[72]  Ladjel Bellatreche,et al.  Contribution of ontology-based data modeling to automatic integration of electronic catalogues within engineering databases , 2006, Comput. Ind..

[73]  Rafael Berlanga Llavori,et al.  Building data warehouses with semantic data , 2010, EDBT '10.

[74]  Ladjel Bellatreche,et al.  Semantic Data Warehouse Design: From ETL to Deployment à la Carte , 2013, DASFAA.

[75]  Alberto Abelló,et al.  A requirement-driven approach to the design and evolution of data warehouses , 2014, Inf. Syst..