A model-driven framework for ETL process development

ETL processes are the backbone component of a data warehouse, since they supply the data warehouse with the necessary integrated and reconciled data from heterogeneous and distributed data sources. However, the ETL process development, and particularly its design phase, is still perceived as a time-consuming task. This is mainly due to the fact that ETL processes are typically designed by considering a specific technology from the very beginning of the development process. Thus, it is difficult to share and reuse methodologies and best practices among projects implemented with different technologies. To the best of our knowledge, no attempt has been yet dedicated to harmonize the ETL process development by proposing a common and integrated development strategy. To overcome this drawback, in this paper, a framework for model-driven development of ETL processes is introduced. The benefit of our framework is twofold: (i) using vendor-independent models for a unified design of ETL processes, based on the expressive and well-known standard for modeling business processes, the Business Process Modeling Notation (BPMN), and (ii) automatically transforming these models into the required vendor-specific code to execute the ETL process into a concrete platform.

[1]  Panos Vassiliadis,et al.  Conceptual modeling for ETL processes , 2002, DOLAP '02.

[2]  Alkis Simitsis,et al.  Mapping conceptual to logical models for ETL processes , 2005, DOLAP '05.

[3]  Esteban Zimányi,et al.  Defining ETL worfklows using BPMN and BPEL , 2009, DOLAP.

[4]  Juan Trujillo,et al.  Physical modeling of data warehouses using UML , 2004, DOLAP '04.

[5]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[6]  Juan Trujillo,et al.  A UML Based Approach for Modeling ETL Processes in Data Warehouses , 2003, ER.

[7]  TrujilloJuan,et al.  An MDA approach for the development of data warehouses , 2008 .

[8]  Jean Bézivin,et al.  On the unification power of models , 2005, Software & Systems Modeling.

[9]  Panos Vassiliadis,et al.  A generic and customizable framework for the design of ETL scenarios , 2005, Inf. Syst..

[10]  W. H. Inmon,et al.  Building the Data Warehouse,3rd Edition , 2002 .

[11]  Panos Vassiliadis,et al.  A Methodology for the Conceptual Modeling of ETL Processes , 2003, CAiSE Workshops.

[12]  Timos K. Sellis,et al.  Ontology-Driven Conceptual Design of ETL Processes Using Graph Transformations , 2009, J. Data Semant..

[13]  Jose-Norberto Mazón,et al.  An MDA approach for the development of data warehouses , 2008, Decis. Support Syst..

[14]  Torben Bach Pedersen,et al.  pygrametl: a powerful programming framework for extract-transform-load programmers , 2009, DOLAP.

[15]  Dimitrios Skoutas,et al.  Designing ETL processes using semantic web technologies , 2006, DOLAP '06.

[16]  Daniel Pol,et al.  Principles for an ETL Benchmark , 2009, TPCTC.

[17]  Dimitrios Skoutas,et al.  Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data , 2007, Int. J. Semantic Web Inf. Syst..

[18]  Panos Vassiliadis,et al.  A method for the mapping of conceptual designs to logical blueprints for ETL processes , 2008, Decis. Support Syst..

[19]  Panos Vassiliadis,et al.  Deciding the physical implementation of ETL workflows , 2007, DOLAP '07.

[20]  Panos Vassiliadis,et al.  A taxonomy of ETL activities , 2009, DOLAP.