Quality measures for ETL processes: from goals to implementation

ETL processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of Business Process Management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using Goal Modeling techniques.

[1]  Herbert Kuchen,et al.  Efficiency evaluation of open source ETL tools , 2011, SAC.

[2]  Esteban Zimányi,et al.  Defining ETL worfklows using BPMN and BPEL , 2009, DOLAP.

[3]  Pearl Brereton,et al.  Systematic literature reviews in software engineering - A systematic literature review , 2009, Inf. Softw. Technol..

[4]  Mario Piattini,et al.  FMESP: framework for the modeling and evaluation of software processes , 2004, QUTE-SWAP '04.

[5]  Sandro Morasca,et al.  Property-Based Software Engineering Measurement , 1996, IEEE Trans. Software Eng..

[6]  Kevin M. Stine,et al.  Performance Measurement Guide for Information Security , 2008 .

[7]  Ivan Pavlov A QoX model for ETL subsystems: theoretical and industry perspectives , 2013, CompSysTech '13.

[8]  Schahram Dustdar,et al.  Quality-aware service-oriented data integration: requirements, state of the art and open challenges , 2012, SGMD.

[9]  Mark Klein,et al.  Quantifying the costs and benefits of architectural decisions , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[10]  Kevin Wilkinson,et al.  Leveraging Business Process Models for ETL Design , 2010, ER.

[11]  Lieven Eeckhout,et al.  Performance Evaluation and Benchmarking , 2005 .

[12]  Kevin Wilkinson,et al.  QoX-driven ETL design: reducing the cost of ETL consulting engagements , 2009, SIGMOD Conference.

[13]  Axel van Lamsweerde,et al.  Goal-Oriented Requirements Engineering: A Guided Tour , 2001, RE.

[14]  Esteban Zimányi,et al.  BPMN-Based Conceptual Modeling of ETL Processes , 2012, DaWaK.

[15]  Chris F. Kemerer,et al.  Cyclomatic Complexity Density and Software Maintenance Productivity , 1991, IEEE Trans. Software Eng..

[16]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[17]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[18]  Timos K. Sellis,et al.  Optimizing ETL processes in data warehouses , 2005, 21st International Conference on Data Engineering (ICDE'05).

[19]  William B. Frakes,et al.  Software reuse: metrics and models , 1996, CSUR.

[20]  M. Jarke,et al.  Fundamentals of Data Warehouses , 2003, Springer Berlin Heidelberg.

[21]  C. Murray Woodside,et al.  Evaluating the scalability of distributed systems , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[22]  Jan Mendling,et al.  Quality indicators for business process models from a gateway complexity perspective , 2012, Inf. Softw. Technol..

[23]  Rafa E. Al-Qutaish An Investigation of the Weaknesses of the ISO 9126 International Standard , 2009, 2009 Second International Conference on Computer and Electrical Engineering.

[24]  Julio Cesar Sampaio do Prado Leite,et al.  Software Transparency , 2010, Bus. Inf. Syst. Eng..

[25]  Felix Naumann,et al.  Quality-Driven Query Answering for Integrated Information Systems , 2002, Lecture Notes in Computer Science.

[26]  Jose-Norberto Mazón,et al.  Measures for ETL processes models in data warehouses , 2009 .

[27]  Umeshwar Dayal,et al.  Benchmarking ETL Workflows , 2009, TPCTC.

[28]  Wolfgang Lehner,et al.  GCIP: exploiting the generation and optimization of integration processes , 2009, EDBT '09.