Search-Based Test Data Generation for SQL Queries

Database-centric systems strongly rely on SQL queries to manage and manipulate their data. These SQL commands can range from very simple selections to queries that involve several tables, subqueries, and grouping operations. And, as with any important piece of code, developers should properly test SQL queries. In order to completely test a SQL query, developers need to create test data that exercise all possible coverage targets in a query, e.g., JOINs and WHERE predicates. And indeed, this task can be challenging and time-consuming for complex queries. Previous studies have modeled the problem of generating test data as a constraint satisfaction problem and, with the help of SAT solvers, generate the required data. However, such approaches have strong limitations, such as partial support for queries with JOINs, subqueries, and strings (which are commonly used in SQL queries). In this paper, we model test data generation for SQL queries as a search-based problem. Then, we devise and evaluate three different approaches based on random search, biased random search, and genetic algorithms (GAs). The GA, in particular, uses a fitness function based on information extracted from the physical query plan of a database engine as search guidance. We then evaluate each approach in 2,135 queries extracted from three open source software and one industrial software system. Our results show that GA is able to completely cover 98.6% of all queries in the dataset, requiring only a few seconds per query. Moreover, it does not suffer from the limitations affecting state-of-the art techniques.

[1]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[2]  Jennifer Widom,et al.  Database System Implementation , 2000 .

[3]  Tao Xie,et al.  Database state generation via dynamic symbolic execution for coverage criteria , 2011, DBTest '11.

[4]  Phil McMinn,et al.  Search‐based software test data generation: a survey , 2004, Softw. Test. Verification Reliab..

[5]  Mohammad Alshraideh,et al.  Search‐based software test data generation for string data using program‐specific search operators , 2006, Softw. Test. Verification Reliab..

[6]  Phil McMinn,et al.  SchemaAnalyst: Search-Based Test Data Generation for Relational Database Schemas , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[7]  Paolo Tonella,et al.  Reformulating Branch Coverage as a Many-Objective Optimization Problem , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[8]  Chengkai Li,et al.  Dynamic symbolic database application testing , 2010, DBTest '10.

[9]  GORDON FRASER,et al.  A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite , 2014, ACM Trans. Softw. Eng. Methodol..

[10]  Nitesh V. Chawla,et al.  C4.5 and Imbalanced Data sets: Investigating the eect of sampling method, probabilistic estimate, and decision tree structure , 2003 .

[11]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[12]  Javier Tuya,et al.  Incremental test data generation for database queries , 2017, Automated Software Engineering.

[13]  Bogdan Korel,et al.  Automated Software Test Data Generation , 1990, IEEE Trans. Software Eng..

[14]  Paolo Tonella,et al.  Automated Test Case Generation as a Many-Objective Optimisation Problem with Dynamic Selection of the Targets , 2018, IEEE Transactions on Software Engineering.

[15]  Phil McMinn,et al.  The Effectiveness of Test Coverage Criteria for Relational Database Schema Integrity Constraints , 2015, ACM Trans. Softw. Eng. Methodol..

[16]  Javier Tuya,et al.  Full predicate coverage for testing SQL database queries , 2010 .

[17]  Gordon Fraser,et al.  A Memetic Algorithm for whole test suite generation , 2015, J. Syst. Softw..

[18]  Kalyanmoy Deb,et al.  Analysing mutation schemes for real-parameter genetic algorithms , 2014, Int. J. Artif. Intell. Soft Comput..

[19]  Yuhua Qi,et al.  The strength of random search on automated program repair , 2014, ICSE.

[20]  Carsten Binnig,et al.  QAGen: generating query-aware test databases , 2007, SIGMOD '07.

[21]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[22]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[23]  Edward P. K. Tsang,et al.  Foundations of constraint satisfaction , 1993, Computation in cognitive science.

[24]  Yi Zhang,et al.  MODA: automated test generation for database applications via mock objects , 2010, ASE '10.

[25]  Arie van Deursen,et al.  Search-Based Test Data Generation for SQL , 2018 .

[26]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[27]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[28]  Rupak Majumdar,et al.  Dynamic test input generation for database applications , 2007, ISSTA '07.

[29]  Mark Harman,et al.  Regression testing minimization, selection and prioritization: a survey , 2012, Softw. Test. Verification Reliab..

[30]  Sarfraz Khurshid,et al.  Query-Aware Test Generation Using a Relational Constraint Solver , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[31]  Phyllis G. Frankl,et al.  Query-based test generation for database applications , 2008, DBTest '08.

[32]  Xavier Lorca,et al.  Choco: an Open Source Java Constraint Programming Library , 2008 .

[33]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[34]  Daniel Jackson,et al.  Alloy: a lightweight object modelling notation , 2002, TSEM.

[35]  Javier Tuya,et al.  Full predicate coverage for testing SQL database queries , 2010, Softw. Test. Verification Reliab..

[36]  Gordon Fraser,et al.  Seeding strategies in search‐based unit test generation , 2016, Softw. Test. Verification Reliab..