Testing Extract-Transform-Load Process in Data Warehouse Systems

Enterprises use data warehouses to accumulate data from multiple sources for analysis and research. A data warehouse is populated using the Extract, Transform, and Load (ETL) process that (1) extracts data from various sources, (2) integrates, cleans, and transforms it into a common form, and (3) loads it into the data warehouse. Faults in the ETL implementation and execution can lead to incorrect data in the data warehouse, which renders it useless irrespective of the quality of the applications accessing it and the quality of the source data. Thus, ETL processes must be thoroughly tested to validate the correctness of the ETL implementation. This project develops and evaluates two types of functional testing approaches, namely data quality, and balancing tests. Data quality tests validate the data in the target data warehouse in isolation and balancing tests check for discrepancies between the source and target data. This paper describes the proposed approach, the work accomplished to date, and the expected contributions of this research.

[1]  Oren Etzioni,et al.  Learning to Understand Information on the Internet: An Example-Based Approach , 1997, Journal of Intelligent Information Systems.

[2]  Liu Chen,et al.  A Survey on NoSQL Stores , 2018, ACM Comput. Surv..

[3]  Lopamudra Dey,et al.  Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering , 2017 .

[4]  Pedro Nuno San-Banto Furtado Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions , 2009 .

[5]  Esteban Zimányi,et al.  BPMN-Based Conceptual Modeling of ETL Processes , 2012, DaWaK.

[6]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[7]  Mohiuddin Ahmed,et al.  A novel approach for outlier detection and clustering improvement , 2013, 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA).

[8]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[9]  Matteo Golfarelli,et al.  Data Warehouse Testing , 2011, Int. J. Data Warehous. Min..

[10]  Teng Wang,et al.  Dimensional modeling of medical data warehouse based on ontology , 2018, 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA).

[11]  Indrakshi Ray,et al.  AN APPROACH FOR TESTING THE EXTRACT-TRANSFORM-LOAD PROCESS IN DATA WAREHOUSE SYSTEMS Submitted , 2017 .

[12]  Xuemei Cai,et al.  A Novel k-Means Algorithm for Clustering and Outlier Detection , 2009, 2009 Second International Conference on Future Information Technology and Management Engineering.

[13]  N. ElGamal,et al.  Towards a data warehouse testing framework , 2012, 2011 Ninth International Conference on ICT and Knowledge Engineering.

[14]  Hussein A. Abbass,et al.  Co-Operative Coevolutionary Neural Networks for Mining Functional Association Rules , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Vincent Rainardi,et al.  Building a Data Warehouse: With Examples in SQL Server , 2008 .

[16]  Ansaf Salleb-Aouissi,et al.  QuantMiner for mining quantitative association rules , 2013, J. Mach. Learn. Res..

[17]  Matteo Golfarelli,et al.  A comprehensive approach to data warehouse testing , 2009, DOLAP.