A proposed model for data warehouse ETL processes

Extraction-transformation-loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, its cleansing, customization, reformatting, integration, and insertion into a data warehouse. Building the ETL process is potentially one of the biggest tasks of building a warehouse; it is complex, time consuming, and consumes most of data warehouse project's implementation efforts, costs, and resources. Building a data warehouse requires focusing closely on understanding three main areas: the source area, the destination area, and the mapping area (ETL processes). The source area has standard models such as entity relationship diagram, and the destination area has standard models such as star schema, but the mapping area has not a standard model till now. In spite of the importance of ETL processes, little research has been done in this area due to its complexity. There is a clear lack of a standard model that can be used to represent the ETL scenarios. In this paper we will try to navigate through the efforts done to conceptualize the ETL processes. Research in the field of modeling ETL processes can be categorized into three main approaches: Modeling based on mapping expressions and guidelines, modeling based on conceptual constructs, and modeling based on UML environment. These projects try to represent the main mapping activities at the conceptual level. Due to the variation and differences between the proposed solutions for the conceptual design of ETL processes and due to their limitations, this paper also will propose a model for conceptual design of ETL processes. The proposed model is built upon the enhancement of the models in the previous models to support some missing mapping features.

[1]  Panos Vassiliadis,et al.  Conceptual modeling for ETL processes , 2002, DOLAP '02.

[2]  Alex Berson,et al.  Data Warehousing, Data Mining, and OLAP , 1997 .

[3]  Jose-Norberto Mazon,et al.  Systematic review and comparison of modeling ETL processes in data warehouse , 2010, 5th Iberian Conference on Information Systems and Technologies.

[4]  Panos Vassiliadis,et al.  A method for the mapping of conceptual designs to logical blueprints for ETL processes , 2008, Decis. Support Syst..

[5]  M. Jarke,et al.  Fundamentals of Data Warehouses , 2003, Springer Berlin Heidelberg.

[6]  Martin Staudt,et al.  Metadata Management and Data Warehousing , 1999 .

[7]  Michael Stonebraker,et al.  Content integration for e-business , 2001, SIGMOD '01.

[8]  Weiwei Sun,et al.  Generating Incremental ETL Processes Automatically , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[9]  Panos Vassiliadis,et al.  Data Mapping Diagrams for Data Warehouse Design with UML , 2004, ER.

[10]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[11]  Alkis Simitsis,et al.  Modeling and managing ETL processes , 2003, VLDB PhD Workshop.

[12]  Panos Vassiliadis,et al.  Data Warehouse Modeling and Quality Issues , 2000 .

[13]  Jose-Norberto Mazón,et al.  A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses , 2010, Inf. Softw. Technol..

[14]  Erhard Rahm,et al.  Data Warehouse Scenarios for Model Management , 2000, ER.

[15]  Aïcha-Nabila Benharkat,et al.  Query-based data warehousing tool , 2002, DOLAP '02.

[16]  Sergio Luján-Mora,et al.  A Comprehensive Method for Data Warehouse Design , 2003, DMDW.

[17]  Panos Vassiliadis,et al.  Modeling ETL activities as graphs , 2002, DMDW.

[18]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit , 2009 .

[19]  Ralph Kimball,et al.  The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data , 2004 .

[20]  Panos Vassiliadis,et al.  A generic and customizable framework for the design of ETL scenarios , 2005, Inf. Syst..

[21]  Wolfgang Slany,et al.  Enhancing Wrapper Usability through Ontology Sharing and Large Scale Cooperation , 2005 .

[22]  Dimitrios Skoutas,et al.  Natural language reporting for ETL processes , 2008, DOLAP '08.

[23]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[24]  Klaus R. Dittrich,et al.  Operators and Classification for Data Mapping in Semantic Integration , 2003, ER.

[25]  Juan Trujillo,et al.  A UML Based Approach for Modeling ETL Processes in Data Warehouses , 2003, ER.

[26]  Stefan Deßloch,et al.  Towards generating ETL processes for incremental loading , 2008, IDEAS '08.

[27]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[28]  Shamim A. Naqvi,et al.  A Logical Language for Data and Knowledge Bases , 1989 .

[29]  Panos Vassiliadis,et al.  A Framework for the Design of ETL Scenarios , 2003, CAiSE.

[30]  Martin White,et al.  Enterprise information portals , 2000, Electron. Libr..

[31]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom , 1998 .

[32]  T. V. Suresh Kumar,et al.  Simulating Secure Data Extraction in Extraction Transformation Loading (ETL) Processes , 2009, 2009 Third UKSim European Symposium on Computer Modeling and Simulation.

[33]  Jose-Norberto Mazón,et al.  Measures for ETL processes models in data warehouses , 2009 .