Integrating ETL Processes from Information Requirements

Data warehouse (DW) design is based on a set of requirements expressed as service level agreements (SLAs) and business level objects (BLOs). Populating a DW system from a set of information sources is realized with extract-transform-load (ETL) processes based on SLAs and BLOs. The entire task is complex, time consuming, and hard to be performed manually. This paper presents our approach to the requirement-driven creation of ETL designs. Each requirement is considered separately and a respective ETL design is produced. We propose an incremental method for consolidating these individual designs and creating an ETL design that satisfies all given requirements. Finally, the design produced is sent to an ETL engine for execution. We illustrate our approach through an example based on TPC-H and report on our experimental findings that show the effectiveness and quality of our approach.

[1]  Ryan Wisnesky,et al.  Orchid: Integrating Schema Mapping and ETL , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Kevin Wilkinson,et al.  QoX-driven ETL design: reducing the cost of ETL consulting engagements , 2009, SIGMOD Conference.

[3]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2005, Theor. Comput. Sci..

[4]  Eric Yu,et al.  Conceptual Modeling: Foundations and Applications , 2009 .

[5]  Werner Nutt,et al.  Containment of Aggregate Queries , 2003, ICDT.

[6]  Laura M. Haas,et al.  Clio: Schema Mapping Creation and Data Exchange , 2009, Conceptual Modeling: Foundations and Applications.

[7]  Per-Ake Larson,et al.  Performing Group-By before Join , 1994, ICDE 1994.

[8]  Alberto Abelló,et al.  GEM: Requirement-Driven Generation of ETL and Multidimensional Conceptual Designs , 2011, DaWaK.

[9]  Timos K. Sellis,et al.  Optimizing ETL processes in data warehouses , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  Panos Kalnis,et al.  Multi-query optimization for on-line analytical processing , 2003, Inf. Syst..

[11]  Kevin Wilkinson,et al.  Leveraging Business Process Models for ETL Design , 2010, ER.

[12]  Paolo Papotti,et al.  Core schema mappings , 2009, SIGMOD Conference.

[13]  Esteban Zimányi,et al.  Defining ETL worfklows using BPMN and BPEL , 2009, DOLAP.

[14]  Kevin Wilkinson,et al.  Optimizing ETL workflows for fault-tolerance , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[15]  Peretz Shoval,et al.  Conceptual Modeling - ER 2010, 29th International Conference on Conceptual Modeling, Vancouver, BC, Canada, November 1-4, 2010. Proceedings , 2010, ER.

[16]  Jose-Norberto Mazón,et al.  An MDA approach for the development of data warehouses , 2008, Decis. Support Syst..

[17]  Hongjun Lu,et al.  Conceptual Modeling – ER 2004 , 2004, Lecture Notes in Computer Science.

[18]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[19]  Gabriel M. Kuper,et al.  Structural Properties of XPath Fragments , 2003, ICDT.

[20]  Panos Vassiliadis,et al.  Conceptual modeling for ETL processes , 2002, DOLAP '02.

[21]  Panos Vassiliadis,et al.  Data Mapping Diagrams for Data Warehouse Design with UML , 2004, ER.