Search-Based Test Data Generation for SQL

Database-centric systems strongly rely on SQL queries to manage and manipulate their data. These SQL commands can range from very simple selections to queries that involve several tables, subqueries, and grouping operations. And, as with any important piece of code, developers should properly test SQL queries. In order to completely test a SQL query, developers need to create test data that exercise all possible coverage targets in a query, e.g., JOINs and WHERE predicates. And indeed, this task can be challenging and time-consuming for complex queries. Previous studies have modeled the problem of generating test data as a constraint satisfaction problem and, with the help of SAT solvers, generate the required data. However, such approaches have strong limitations, such as partial support for queries with JOINs, subqueries, and strings (which are commonly used in SQL queries). In this paper, we model test data generation for SQL queries as a search-based problem. Then, we devise and evaluate three diferent approaches based on random search, biased random search, and genetic algorithms (GAs). The GA, in particular, uses a itness function based on information extracted from the physical query plan of a database engine as search guidance. We then evaluate each approach in 2,135 queries extracted from three open source software and one industrial software system. Our results show that GA is able to completely cover 98.6% of all queries in the dataset, requiring only a few seconds per query. Moreover, it does not sufer from the limitations afecting state-of-the art techniques.

[1]  Nitesh V. Chawla,et al.  C4.5 and Imbalanced Data sets: Investigating the eect of sampling method, probabilistic estimate, and decision tree structure , 2003 .

[2]  Gordon Fraser,et al.  Seeding strategies in search‐based unit test generation , 2016, Softw. Test. Verification Reliab..

[3]  Sarfraz Khurshid,et al.  Query-Aware Test Generation Using a Relational Constraint Solver , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[4]  Javier Tuya,et al.  Full predicate coverage for testing SQL database queries , 2010, Softw. Test. Verification Reliab..

[5]  S. N. Sivanandam,et al.  Introduction to genetic algorithms , 2007 .

[6]  Xavier Lorca,et al.  Choco: an Open Source Java Constraint Programming Library , 2008 .

[7]  Mark Harman,et al.  Regression testing minimization, selection and prioritization: a survey , 2012, Softw. Test. Verification Reliab..

[8]  Chengkai Li,et al.  Dynamic symbolic database application testing , 2010, DBTest '10.

[9]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[10]  Phyllis G. Frankl,et al.  Query-based test generation for database applications , 2008, DBTest '08.

[11]  Phil McMinn,et al.  The Effectiveness of Test Coverage Criteria for Relational Database Schema Integrity Constraints , 2015, ACM Trans. Softw. Eng. Methodol..

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[14]  Paolo Tonella,et al.  Reformulating Branch Coverage as a Many-Objective Optimization Problem , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[15]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[16]  Carsten Binnig,et al.  QAGen: generating query-aware test databases , 2007, ACM SIGMOD Conference.

[17]  Yi Zhang,et al.  MODA: automated test generation for database applications via mock objects , 2010, ASE '10.

[18]  Mohammad Alshraideh,et al.  Search-based software test data generation for string data using program-specific search operators: Research Articles , 2006 .

[19]  Gordon Fraser,et al.  A Memetic Algorithm for whole test suite generation , 2015, J. Syst. Softw..

[20]  Yuhua Qi,et al.  The strength of random search on automated program repair , 2014, ICSE.

[21]  Rupak Majumdar,et al.  Dynamic test input generation for database applications , 2007, ISSTA '07.

[22]  Daniel Jackson,et al.  Alloy: a lightweight object modelling notation , 2002, TSEM.

[23]  Javier Tuya,et al.  Incremental test data generation for database queries , 2017, Automated Software Engineering.

[24]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[25]  Tao Xie,et al.  Database state generation via dynamic symbolic execution for coverage criteria , 2011, DBTest '11.

[26]  Kalyanmoy Deb,et al.  Analysing mutation schemes for real-parameter genetic algorithms , 2014, Int. J. Artif. Intell. Soft Comput..

[27]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[28]  GORDON FRASER,et al.  A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite , 2014, ACM Trans. Softw. Eng. Methodol..

[29]  McMinn Phil,et al.  SchemaAnalyst: Search-Based Test Data Generation for Relational Database Schemas , 2016 .

[30]  Paolo Tonella,et al.  Automated Test Case Generation as a Many-Objective Optimisation Problem with Dynamic Selection of the Targets , 2018, IEEE Transactions on Software Engineering.

[31]  Edward P. K. Tsang,et al.  Foundations of constraint satisfaction , 1993, Computation in cognitive science.

[32]  Bogdan Korel,et al.  Automated Software Test Data Generation , 1990, IEEE Trans. Software Eng..

[33]  Phil McMinn,et al.  Search‐based software test data generation: a survey , 2004, Softw. Test. Verification Reliab..