Data Integration Patterns for Data Warehouse Automation

The paper presents a mapping-based and metadata-driven modular data transformation framework designed to solve extract-transform-load (ETL) automation, impact analysis, data quality and integration problems in data warehouse environments. We introduce a declarative mapping formalization technique, an abstract expression pattern concept and a related template engine technology for flexible ETL code generation and execution. The feasibility and efficiency of the approach is demonstrated on the pattern detection and data lineage analysis case studies using large real life SQL corpuses.

[1]  Steven P. Reiss,et al.  Finding Unusual Code , 2007, 2007 IEEE International Conference on Software Maintenance.

[2]  Timos K. Sellis,et al.  Rule-Based Management of Schema Changes at ETL Sources , 2009, ADBIS.

[3]  Ryan Wisnesky,et al.  Orchid: Integrating Schema Mapping and ETL , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Mary Roth,et al.  XML mapping technology: Making connections in an XML-centric world , 2006, IBM Syst. J..

[5]  Wolfgang Lehner,et al.  Model-Driven Generation and Optimization of Complex Integration Processes , 2008, ICEIS.

[6]  Erhard Rahm,et al.  An Integrative and Uniform Model for Metadata Management in Data Warehousing Environments , 1999, DMDW.

[7]  Kevin Wilkinson,et al.  Optimizing ETL workflows for fault-tolerance , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[8]  Johannes De Smedt,et al.  Improving Understandability of Declarative Process Models by Revealing Hidden Dependencies , 2016, CAiSE.

[9]  Laura M. Haas,et al.  Clio grows up: from research prototype to industrial tool , 2005, SIGMOD '05.

[10]  Srikantha Rao,et al.  Data integration problem of structural and semantic heterogeneity: data warehousing framework models for the optimization of the ETL processes , 2011, ICWET.

[11]  Feng Yu,et al.  The Research & Application of ETL Tool in Business Intelligence Project , 2009, 2009 International Forum on Information Technology and Applications.

[12]  Mirta Baranovic,et al.  Generating data quality rules and integration into ETL process , 2009, DOLAP.

[13]  Timos K. Sellis,et al.  Optimizing ETL processes in data warehouses , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Paolo Giorgini,et al.  GRAnD: A goal-oriented approach to requirement analysis in data warehouses , 2008, Decis. Support Syst..

[15]  Wolfgang Lehner,et al.  GCIP: exploiting the generation and optimization of integration processes , 2009, EDBT '09.

[16]  Xudong Song,et al.  Design ETL Metamodel Based on UML Profile , 2009, 2009 Second International Symposium on Knowledge Acquisition and Modeling.

[17]  Andreas Behrend,et al.  Optimized incremental ETL jobs for maintaining data warehouses , 2010, IDEAS '10.

[18]  Panos Vassiliadis,et al.  A Framework for the Design of ETL Scenarios , 2003, CAiSE.