GEM: Requirement-Driven Generation of ETL and Multidimensional Conceptual Designs

At the early stages of a data warehouse design project, the main objective is to collect the business requirements and needs, and translate them into an appropriate conceptual, multidimensional design. Typically, this task is performed manually, through a series of interviews involving two different parties: the business analysts and technical designers. Producing an appropriate conceptual design is an error-prone task that undergoes several rounds of reconciliation and redesigning, until the business needs are satisfied. It is of great importance for the business of an enterprise to facilitate and automate such a process. The goal of our research is to provide designers with a semi-automatic means for producing conceptual multidimensional designs and also, conceptual representation of the extract-transform-load (ETL) processes that orchestrate the data flow from the operational sources to the data warehouse constructs. In particular, we describe a method that combines information about the data sources along with the business requirements, for validating and completing -if necessary- these requirements, producing a multidimensional design, and identifying the ETL operations needed. We present our method in terms of the TPC-DS benchmark and show its applicability and usefulness.

[1]  Z. Hasan A Survey on Shari’Ah Governance Practices in Malaysia, GCC Countries and the UK , 2011 .

[2]  Jose-Norberto Mazón,et al.  A survey on summarizability issues in multidimensional modeling , 2009, Data Knowl. Eng..

[3]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[4]  Hongjun Lu,et al.  Conceptual Modeling – ER 2004 , 2004, Lecture Notes in Computer Science.

[5]  Helmut Krcmar,et al.  Data Warehouse Design , 2008 .

[6]  Esteban Zimányi,et al.  Defining ETL worfklows using BPMN and BPEL , 2009, DOLAP.

[7]  Gottfried Vossen,et al.  Conceptual Data Warehouse Design , 2000 .

[8]  Kevin Wilkinson,et al.  Optimizing ETL workflows for fault-tolerance , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[9]  John Mylopoulos,et al.  From E-R to "A-R" - Modelling Strategic Actor Relationships for Business Process Reengineering , 1994, Int. J. Cooperative Inf. Syst..

[10]  Jose-Norberto Mazón,et al.  Automatic generation of ETL processes from conceptual models , 2009, DOLAP.

[11]  Jose-Norberto Mazón,et al.  An MDA approach for the development of data warehouses , 2008, Decis. Support Syst..

[12]  Gottfried Vossen,et al.  Conceptual data warehouse modeling , 2000, DMDW.

[13]  Kevin Wilkinson,et al.  Designing integration flows using hypercubes , 2011, EDBT/ICDT '11.

[14]  Matteo Golfarelli,et al.  Data Warehouse Design: Modern Principles and Methodologies , 2009 .

[15]  Wolfgang Lehner,et al.  Normal forms for multidimensional databases , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[16]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[17]  Diego Calvanese,et al.  Discovering functional dependencies for multidimensional design , 2009, DOLAP.

[18]  Alberto Abelló,et al.  Automatic validation of requirements to support multidimensional design , 2010, Data Knowl. Eng..

[19]  Ritu Khare,et al.  SAMSTAR: a semi-automated lexical method for generating star schemas from an entity-relationship diagram , 2007, DOLAP '07.

[20]  Panos Vassiliadis,et al.  Conceptual modeling for ETL processes , 2002, DOLAP '02.

[21]  Alberto Abelló,et al.  A framework for multidimensional design of data warehouses from ontologies , 2010, Data Knowl. Eng..

[22]  Panos Vassiliadis,et al.  Data Mapping Diagrams for Data Warehouse Design with UML , 2004, ER.

[23]  Alberto Abelló,et al.  Automating multidimensional design from ontologies , 2007, DOLAP '07.

[24]  Kevin Wilkinson,et al.  QoX-driven ETL design: reducing the cost of ETL consulting engagements , 2009, SIGMOD Conference.

[25]  Gottfried Vossen,et al.  Multidimensional normal forms for data warehouse design , 2003, Inf. Syst..

[26]  Dimitrios Skoutas,et al.  Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data , 2007, Int. J. Semantic Web Inf. Syst..

[27]  Kevin Wilkinson,et al.  Data integration flows for business intelligence , 2009, EDBT '09.