Chapter Five - Data Warehouse Testing

Abstract Enterprises use data warehouses to accumulate data from multiple sources for data analysis and research. Since organizational decisions are often made based on the data stored in a data warehouse, all its components must be rigorously tested. Researchers have proposed a number of approaches and tools to test and evaluate different components of data warehouse systems. In this chapter, we present a comprehensive survey of data warehouse testing techniques. We define a classification framework that can categorize the existing testing approaches. We also discuss open problems and propose research directions.

[1]  Elaine J. Weyuker,et al.  An AGENDA for testing relational database applications , 2004, Softw. Test. Verification Reliab..

[2]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[3]  Matteo Golfarelli,et al.  A comprehensive approach to data warehouse testing , 2009, DOLAP.

[4]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[5]  Lawrence Corr,et al.  Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema , 2011 .

[6]  Donald R. Slutz,et al.  Massive Stochastic Testing of SQL , 1998, VLDB.

[7]  Vincent Rainardi,et al.  Building a Data Warehouse: With Examples in SQL Server , 2008 .

[8]  Qing Li,et al.  Star/Snow-Flake Schema Driven Object-Relational Data Warehouse - Design and Query Processing Strategies , 1999, DaWaK.

[9]  Qing Li,et al.  Entity-Relationship Diagram , 2009 .

[10]  Christer Carlsson,et al.  Past, present, and future of decision support technology , 2002, Decis. Support Syst..

[11]  Larissa Terpeluk Moss,et al.  Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications , 2003 .

[12]  George Papastefanatos,et al.  Design Metrics for Data Warehouse Evolution , 2008, ER.

[13]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[14]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[15]  Hyoil Han,et al.  XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses , 2005, DaWaK.

[16]  Matteo Golfarelli,et al.  Data Warehouse Design: Modern Principles and Methodologies , 2009 .

[17]  Matteo Golfarelli,et al.  Data Warehouse Testing , 2011, Int. J. Data Warehous. Min..

[18]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit , 2009 .

[19]  Ilene Burnstein,et al.  Practical Software Testing: A Process-Oriented Approach , 2003 .

[20]  OBAS: An OLAP Benchmark for Analysis Services , 2013, J. Inf. Data Manag..

[21]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[22]  Mark A. Weiss Data Structures & Algorithm Analysis in C++ , 2012 .

[23]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[24]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[25]  Mario Piattini,et al.  Towards Data Warehouse Quality Metrics , 2001, DMDW.

[26]  A Depeursinge,et al.  Clinical Data Mining: a Review , 2009, Yearbook of Medical Informatics.

[27]  Clare Stanier,et al.  An Evaluation of the Challenges of Multilingualism in Data Warehouse Development , 2016, ICEIS.

[28]  Galal H. Galal-Edeen,et al.  Data warehouse testing , 2013, EDBT '13.

[29]  Steven G. Johnson,et al.  A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data , 2016, EGEMS.

[30]  Andrea C. Arpaci-Dusseau,et al.  FATE and DESTINI: A Framework for Cloud Recovery Testing , 2011, NSDI.

[31]  Gregg Rothermel,et al.  An empirical study of regression test selection techniques , 2001, ACM Trans. Softw. Eng. Methodol..

[32]  Harry M. Sneed Testing a Datawarehouse - An Industrial Challenge , 2006, Testing: Academic & Industrial Conference - Practice And Research Techniques (TAIC PART'06).

[33]  Xin Bai Testing the Performance of an SSAS Cube Using VSTS , 2010, 2010 Seventh International Conference on Information Technology: New Generations.

[34]  Joseph F. Dumas,et al.  A Practical Guide to Usability Testing , 1993 .

[35]  M. Pamela Neely,et al.  Data Quality Tools for Data Warehousing: A Small Sample Survey , 1998, IQ.

[36]  Henk C. A. van Tilborg,et al.  Encyclopedia of Cryptography and Security, 2nd Ed , 2005 .

[37]  Lionel C. Briand,et al.  Automating impact analysis and regression test selection based on UML designs , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[38]  Marius Marin,et al.  A Data-Agnostic Approach to Automatic Testing of Multi-dimensional Databases , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[39]  David Loshin,et al.  Rule-based data quality , 2002, CIKM '02.

[40]  José Barateiro,et al.  A Survey of Data Quality Tools , 2005, Datenbank-Spektrum.

[41]  Alberto L. Sangiovanni-Vincentelli,et al.  A methodology for correct-by-construction latency insensitive design , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[42]  Ralph Kimball,et al.  The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data , 2004 .

[43]  Jerry Zeyu Gao,et al.  Big Data Validation and Quality Assurance -- Issuses, Challenges, and Needs , 2016, 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE).

[44]  Des Greer,et al.  Agile Software Development , 2011, Softw. Pract. Exp..

[45]  Hongjun Lu,et al.  Cleansing Data for Mining and Warehousing , 1999, DEXA.

[46]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[47]  E. Berner,et al.  Clinical Decision Support Systems: Theory and Practice , 1998 .

[48]  Ravindra S. Hegadi,et al.  A CASE STUDY ON REGRESSION TEST AUTOMATION FOR DATA WAREHOUSE QUALITY ASSURANCE , 2012 .

[49]  Carolyn Snyder,et al.  Paper Prototyping: The Fast and Easy Way to Design and Refine User Interfaces , 2003 .

[50]  Roy P. Pargas,et al.  Test‐data generation using genetic algorithms , 1999, Softw. Test. Verification Reliab..

[51]  Daniel Pol,et al.  Principles for an ETL Benchmark , 2009, TPCTC.

[52]  Ajay Askoolum Structured Query Language , 2007 .

[53]  George Colliat,et al.  OLAP, relational, and multidimensional database systems , 1996, SGMD.