Applying combinatorial test data generation to big data applications

Big data applications (e.g., Extract, Transform, and Load (ETL) applications) are designed to handle great volumes of data. However, processing such great volumes of data is time-consuming. There is a need to construct small yet effective test data sets during agile development of big data applications. In this paper, we apply a combinatorial test data generation approach to two real-world ETL applications at Medidata. In our approach, we first create Input Domain Models (IDMs) automatically by analyzing the original data source and incorporating constraints manually derived from requirements. Next, the IDMs are used to create test data sets that achieve t-way coverage, which has shown to be very effective in detecting software faults. The generated test data sets also satisfy all the constraints identified in the first step. To avoid creating IDMs from scratch when there is a change to the original data source or constraints, our approach extends the original IDMs with additional information. The new IDMs, which we refer to as Adaptive IDMs (AIDMs), are updated by comparing the changes against the additional information, and are then used to generate new test data sets. We implement our approach in a tool, called comBinatorial big daTa Test dAta Generator (BIT-TAG). Our experience shows that combinatorial testing can be effectively applied to big data applications. In particular, the test data sets created using our approach for the two ETL applications are only a small fraction of the original data source, but we were able to detect all the faults found with the original data source.

[1]  Yun Guo,et al.  A Scalable Big Data Test Framework , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[2]  Yu Lei,et al.  IPOG-IPOG-D: efficient test generation for multi-way combinatorial testing , 2008 .

[3]  Jian Li,et al.  Data generation using declarative constraints , 2011, SIGMOD '11.

[4]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[5]  Matthew B. Dwyer,et al.  Automatic generation of load tests , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[6]  Yannis Smaragdakis,et al.  SEDGE: Symbolic example data generation for dataflow programs , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[7]  Volker Markl,et al.  Myriad: Scalable and Expressive Data Generation , 2012, Proc. VLDB Endow..

[8]  Christopher Olston,et al.  Generating example data for dataflow programs , 2009, SIGMOD Conference.

[9]  Jeff Yu Lei,et al.  IPOG/IPOG‐D: efficient test generation for multi‐way combinatorial testing , 2008, Softw. Test. Verification Reliab..

[10]  D. Richard Kuhn,et al.  Software fault interactions and implications for software testing , 2004, IEEE Transactions on Software Engineering.

[11]  Phyllis G. Frankl,et al.  A framework for testing database applications , 2000, ISSTA '00.

[12]  Boyang Li,et al.  Sanitizing and Minimizing Databases for Software Application Test Outsourcing , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[13]  Chunjie Luo,et al.  BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking , 2013, WBDB.

[14]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[15]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[16]  Renée C. Bryce,et al.  A framework of greedy methods for constructing interaction test suites , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[17]  Emina Torlak,et al.  Scalable test data generation from multidimensional models , 2012, SIGSOFT FSE.

[18]  A. Jefferson Offutt,et al.  Introduction to Software Testing , 2008 .

[19]  Surajit Chaudhuri,et al.  Flexible Database Generators , 2005, VLDB.

[20]  Michael L. Fredman,et al.  The AETG System: An Approach to Testing Based on Combinatiorial Design , 1997, IEEE Trans. Software Eng..

[21]  Luciano Baresi,et al.  An Introduction to Software Testing , 2006, FoVMT.