Test-Case Reduction via Test-Case Generation: Insights from the Hypothesis Reducer (Tool Insights Paper)

We describe internal test-case reduction, the method of test-case reduction employed by Hypothesis, a widely-used property-based testing library for Python. The key idea of internal test-case reduction is that instead of applying test-case reduction externally to generated test cases, we apply it internally, to the sequence of random choices made during generation, so that a test case is reduced by continually re-generating smaller and simpler test cases that continue to trigger some property of interest (e.g. a bug in the system under test). This allows for fully generic test-case reduction without any user intervention and without the need to write a specific test-case reducer for a particular application domain. It also significantly mitigates the impact of the test-case validity problem, by ensuring that any reduced test case is one that could in principle have been generated. We describe the rationale behind this approach, explain its implementation in Hypothesis, and present an extensive evaluation comparing its effectiveness with that of several other test-case reducers, including C-Reduce and delta debugging, on applications including Python auto-formatting, C compilers, and the SymPy symbolic math library. Our hope is that these insights into the reduction mechanism employed by Hypothesis will be useful to researchers interested in randomized testing and test-case reduction, as the crux of the approach is fully generic and should be applicable to any random generator of test cases. 2012 ACM Subject Classification Software and its engineering → Software testing and debugging

[1]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[2]  Alex Groce,et al.  TSTL: a language and tool for testing (demo) , 2015, ISSTA.

[3]  Simon L. Peyton Jones,et al.  Scrap your boilerplate with class: extensible generic functions , 2005, ICFP '05.

[4]  Alex Groce,et al.  A Little Language for Testing , 2015, NFM.

[5]  Andreas Zeller,et al.  Simplifying failure-inducing input , 2000, ISSTA '00.

[6]  John Hughes,et al.  Testing telecoms software with quviq QuickCheck , 2006, ERLANG '06.

[7]  Koen Claessen,et al.  QuickCheck: a lightweight tool for random testing of Haskell programs , 2000, ICFP.

[8]  Konstantinos Sagonas,et al.  Automating Targeted Property-Based Testing , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[9]  Lee Pike SmartCheck: automatic and efficient counterexample reduction and generalization , 2014, Haskell '14.

[10]  David Maciver,et al.  Hypothesis: A new approach to property-based testing , 2019, J. Open Source Softw..

[11]  Zhendong Su,et al.  HDD: hierarchical delta debugging , 2006, ICSE.

[12]  Konstantinos Sagonas,et al.  Targeted property-based testing , 2017, ISSTA.

[13]  Mindy Preston,et al.  Testing with Crowbar , 2017 .

[14]  Eugenio Moggi,et al.  Notions of Computation and Monads , 1991, Inf. Comput..

[15]  Xuejun Yang,et al.  Test-case reduction for C compiler bugs , 2012, PLDI.

[16]  Yves Le Traon,et al.  Semantic fuzzing with zest , 2018, ISSTA.

[17]  Ákos Kiss,et al.  Practical Improvements to the Minimizing Delta Debugging Algorithm , 2016, ICSOFT-EA.

[18]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[19]  Alex Groce,et al.  DeepState: Symbolic Unit Testing for C and C++ , 2018 .

[20]  Alex Groce,et al.  One test to rule them all , 2017, ISSTA.