A BPMN-Based Design and Maintenance Framework for ETL Processes

Business Intelligence BI applications require the design, implementation, and maintenance of processes that extract, transform, and load suitable data for analysis. The development of these processes known as ETL is an inherently complex problem that is typically costly and time consuming. In a previous work, the authors have proposed a vendor-independent language for reducing the design complexity due to disparate ETL languages tailored to specific design tools with steep learning curves. Nevertheless, the designer still faces two major issues during the development of ETL processes: i how to implement the designed processes in an executable language, and ii how to maintain the implementation when the organization data infrastructure evolves. In this paper, the authors propose a model-driven framework that provides automatic code generation capability and ameliorate maintenance support of our ETL language. They present a set of model-to-text transformations able to produce code for different ETL commercial tools as well as model-to-model transformations that automatically update the ETL models with the aim of supporting the maintenance of the generated code according to data source evolution. A demonstration using an example is conducted as an initial validation to show that the framework covering modeling, code generation and maintenance could be used in practice.

[1]  Esteban Zimányi,et al.  A model-driven framework for ETL process development , 2011, DOLAP '11.

[2]  Panos Vassiliadis,et al.  A taxonomy of ETL activities , 2009, DOLAP.

[3]  M. Gholamian International Journal of Data Warehousing and Mining , 2014 .

[4]  Jose-Norberto Mazón,et al.  Automatic generation of ETL processes from conceptual models , 2009, DOLAP.

[5]  Daniel Pol,et al.  Principles for an ETL Benchmark , 2009, TPCTC.

[6]  George Papastefanatos,et al.  Policy-Regulated Management of ETL Evolution , 2009, J. Data Semant..

[7]  Jose-Norberto Mazón,et al.  A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses , 2010, Inf. Softw. Technol..

[8]  Esteban Zimányi,et al.  BPMN-Based Conceptual Modeling of ETL Processes , 2012, DaWaK.

[9]  W. H. Inmon,et al.  Building the Data Warehouse,3rd Edition , 2002 .

[10]  Roel Wieringa,et al.  Design science methodology: principles and practice , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[11]  Carlo Curino,et al.  Graceful database schema evolution: the PRISM workbench , 2008, Proc. VLDB Endow..

[12]  Esteban Zimányi,et al.  Proceedings of the 12th ACM International Workshop on Data Warehousing and OLAP, DOLAP 2009 , 2009 .

[13]  Francesco Folino,et al.  Effective Analysis of Flexible Collaboration Processes by Way of Abstraction and Mining Techniques , 2010, ICEIS.

[14]  Panos Vassiliadis,et al.  A generic and customizable framework for the design of ETL scenarios , 2005, Inf. Syst..

[15]  Alfredo Cuzzocrea,et al.  Model-driven data mining engineering: from solution-driven implementations to 'composable' conceptual data mining models , 2011, Int. J. Data Min. Model. Manag..

[16]  Conceptual Designs , .

[17]  Panos Vassiliadis,et al.  A method for the mapping of conceptual designs to logical blueprints for ETL processes , 2008, Decis. Support Syst..

[18]  George Papastefanatos,et al.  Design Metrics for Data Warehouse Evolution , 2008, ER.

[19]  Juan Trujillo,et al.  A UML Based Approach for Modeling ETL Processes in Data Warehouses , 2003, ER.

[20]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[21]  Alfredo Cuzzocrea,et al.  A UML-extended Approach for Mining OLAP Data Cubes in Complex Knowledge Discovery Environments , 2016, ICEIS.

[22]  Ye Wang,et al.  Identify Cross-Selling Opportunities via Hybrid Classifier , 2008, Int. J. Data Warehous. Min..

[23]  Panos Vassiliadis,et al.  Deciding the physical implementation of ETL workflows , 2007, DOLAP '07.

[24]  Timos K. Sellis,et al.  Ontology-Driven Conceptual Design of ETL Processes Using Graph Transformations , 2009, J. Data Semant..

[25]  Kevin Wilkinson,et al.  Leveraging Business Process Models for ETL Design , 2010, ER.

[26]  Alberto Abelló,et al.  GEM: Requirement-Driven Generation of ETL and Multidimensional Conceptual Designs , 2011, DaWaK.

[27]  Esteban Zimányi,et al.  Defining ETL worfklows using BPMN and BPEL , 2009, DOLAP.

[28]  TrujilloJuan,et al.  An MDA approach for the development of data warehouses , 2008 .

[29]  Torben Bach Pedersen,et al.  Easy and effective parallel programmable ETL , 2011, DOLAP '11.

[30]  Gottfried Vossen,et al.  Schema versioning in data warehouses: Enabling cross-version querying via schema augmentation , 2006, Data Knowl. Eng..