Generating focused random tests using directed swarm testing

Random testing can be a powerful and scalable method for finding faults in software. However, sophisticated random testers usually test a whole program, not individual components. Writing random testers for individual components of complex programs may require unreasonable effort. In this paper we present a novel method, directed swarm testing, that uses statistics and a variation of random testing to produce random tests that focus on only part of a program, increasing the frequency with which tests cover the targeted code. We demonstrate the effectiveness of this technique using real-world programs and test systems (the YAFFS2 file system, GCC, and Mozilla's SpiderMonkey JavaScript engine), and discuss various strategies for directed swarm testing. The best strategies can improve coverage frequency for targeted code by a factor ranging from 1.1-4.5x on average, and from nearly 3x to nearly 9x in the best case. For YAFFS2, directed swarm testing never decreased coverage, and for GCC and SpiderMonkey coverage increased for over 99% and 73% of targets, respectively, using the best strategies. Directed swarm testing improves detection rates for real SpiderMonkey faults, when the code in the introducing commit is targeted. This lightweight technique is applicable to existing industrial-strength random testers.

[1]  B. Hardekopf,et al.  Fuzzing the Rust Typechecker Using CLP , 2015 .

[2]  References , 1971 .

[3]  Alex Groce,et al.  Randomized Differential Testing as a Prelude to Formal Verification , 2007, 29th International Conference on Software Engineering (ICSE'07).

[4]  Tsong Yueh Chen,et al.  Adaptive Random Testing: The ART of test case diversity , 2010, J. Syst. Softw..

[5]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[6]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.

[7]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[8]  Zhendong Su,et al.  Compiler validation via equivalence modulo inputs , 2014, PLDI.

[9]  Edward L. Robertson,et al.  Optimal Tuple Merge is NP-Complete , 2003 .

[10]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[11]  Andreas Zeller,et al.  Fuzzing with Code Fragments , 2012, USENIX Security Symposium.

[12]  Phil McMinn,et al.  Search-Based Software Testing: Past, Present and Future , 2011, 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops.

[13]  Atsushi Hashimoto,et al.  Scaling up Size and Number of Expressions in Random Testing of Arithmetic Optimization of C Compilers , 2013 .

[14]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[15]  Alex Groce,et al.  Lightweight Automated Testing with Adaptation-Based Programming , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[16]  Xuejun Yang,et al.  Testing Static Analyzers with Randomly Generated Programs , 2012, NASA Formal Methods.

[17]  Darko Marinov,et al.  Balancing trade-offs in test-suite reduction , 2014, SIGSOFT FSE.

[18]  Marsha Chechik,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2016, Lecture Notes in Computer Science.

[19]  Paolo Tonella,et al.  Combining Stochastic Grammars and Genetic Programming for Coverage Testing at the System Level , 2014, SSBSE.

[20]  Alex Groce,et al.  Bounded Model Checking and Feature Omission Diversity , 2016, ArXiv.

[21]  I. K. Mak,et al.  Adaptive Random Testing , 2004, ASIAN.

[22]  Alex Groce (Quickly) testing the tester via path coverage , 2009, WODA '09.

[23]  Alex Groce,et al.  Cause reduction: delta debugging, even without bugs , 2016, Softw. Test. Verification Reliab..

[24]  Alex Groce,et al.  Cause Reduction for Quick Testing , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[25]  Lionel C. Briand,et al.  Adaptive random testing: an illusion of effectiveness? , 2011, ISSTA '11.

[26]  Cristian Cadar,et al.  KATCH: high-coverage testing of software patches , 2013, ESEC/FSE 2013.

[27]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[28]  Nikolai Tillmann,et al.  Fitness-guided path exploration in dynamic symbolic execution , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[29]  Jared Roesch,et al.  Fuzzing the Rust Typechecker Using CLP (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[30]  Lionel C. Briand,et al.  Formal analysis of the effectiveness and predictability of random testing , 2010, ISSTA '10.

[31]  Yuanyuan Zhang,et al.  Search-based software engineering: Trends, techniques and applications , 2012, CSUR.

[32]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[33]  Alex Groce,et al.  Using test case reduction and prioritization to improve symbolic execution , 2014, ISSTA 2014.

[34]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[35]  Alex Groce,et al.  Swarm testing , 2012, ISSTA 2012.

[36]  Tim Menzies,et al.  Nighthawk: a two-level genetic-random unit test data generator , 2007, ASE.

[37]  Alex Groce,et al.  Help, help, i'm being suppressed! The significance of suppressors in software testing , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).