Ballerina: Automatic generation and clustering of efficient random unit tests for multithreaded code

Testing multithreaded code is hard and expensive. A multithreaded unit test creates two or more threads, each executing one or more methods on shared objects of the class under test. Such unit tests can be generated at random, but basic random generation produces tests that are either slow or do not trigger concurrency bugs. Worse, such tests have many false alarms, which require human effort to filter out. We present Ballerina, a novel technique for automated random generation of efficient multithreaded tests that effectively trigger concurrency bugs. Ballerina makes tests efficient by having only two threads, each executing a single, randomly selected method. Ballerina increases chances that such simple parallel code finds bugs by appending it to more complex, randomly generated sequential code. We also propose a clustering technique to reduce the manual effort in inspecting failures of automatically generated multithreaded tests. We evaluate Ballerina on 14 real-world bugs from six popular codebases: Groovy, JDK, JFreeChart, Apache Log4j, Apache Lucene, and Apache Pool. The experiments show that tests generated by Ballerina find bugs on average 2×-10× faster than basic random generation, and our clustering technique reduces the number of inspected failures on average 4×-8×. Using Ballerina, we found three previously unknown bugs, two of which were already confirmed and fixed.

[1]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[2]  Koushik Sen,et al.  Effective random testing of concurrent programs , 2007, ASE.

[3]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[4]  Zhendong Su,et al.  Synthesizing method sequences for high-coverage testing , 2011, OOPSLA '11.

[5]  Mark Harman,et al.  Clustering test cases to achieve effective and scalable prioritisation incorporating expert knowledge , 2009, ISSTA.

[6]  Luciano Baresi,et al.  TestFul: An Evolutionary Test Approach for Java , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[7]  Jianjun Zhao,et al.  A Divergence-Oriented Approach to Adaptive Random Testing of Java Programs , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[8]  Myra B. Cohen,et al.  Directed test suite augmentation: techniques and tradeoffs , 2010, FSE '10.

[9]  Klaus Havelund,et al.  Model checking programs , 2000, Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering.

[10]  Bertrand Meyer,et al.  ARTOO , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[11]  Jong-Deok Choi,et al.  Hybrid dynamic data race detection , 2003, PPoPP '03.

[12]  Alexander Aiken,et al.  Effective static race detection for Java , 2006, PLDI '06.

[13]  George C. Necula,et al.  NDSeq: runtime checking for nondeterministic sequential specifications of parallel correctness , 2011, PLDI '11.

[14]  Michael I. Jordan,et al.  Statistical debugging: simultaneous identification of multiple bugs , 2006, ICML.

[15]  Sebastian Burckhardt,et al.  Line-up: a complete and automatic linearizability checker , 2010, PLDI '10.

[16]  David Leon,et al.  Finding failures by cluster analysis of execution profiles , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[17]  Koushik Sen,et al.  DETERMIN: inferring likely deterministic specifications of multithreaded programs , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[18]  Darko Marinov,et al.  Change-aware preemption prioritization , 2011, ISSTA '11.

[19]  Mary Jean Harrold,et al.  Debugging in Parallel , 2007, ISSTA '07.

[20]  Chao Liu,et al.  A Systematic Study of Failure Proximity , 2008, IEEE Transactions on Software Engineering.

[21]  George C. Necula,et al.  Specifying and checking semantic atomicity for multithreaded programs , 2011, ASPLOS XVI.

[22]  Koushik Sen,et al.  Race directed random testing of concurrent programs , 2008, PLDI '08.

[23]  Koushik Sen,et al.  Asserting and checking determinism for multithreaded programs , 2009, ESEC/FSE '09.

[24]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[25]  Oksana Tkachuk,et al.  Application of automated environment generation to commercial software , 2006, ISSTA '06.

[26]  Matthew B. Dwyer,et al.  Parallel Randomized State-Space Search , 2007, 29th International Conference on Software Engineering (ICSE'07).

[27]  Grigore Rosu,et al.  jPredictor: a predictive runtime analysis tool for java , 2008, ICSE '08.

[28]  Vipin Kumar,et al.  Superlinear Speedup in Parallel State-Space Search , 1988, FSTTCS.

[29]  Eitan Farchi,et al.  Multithreaded Java program test generation , 2002, IBM Syst. J..

[30]  Tsong Yueh Chen,et al.  Adaptive Random Testing: The ART of test case diversity , 2010, J. Syst. Softw..

[31]  Tao Xie,et al.  DSD-Crasher: A hybrid analysis tool for bug finding , 2008 .

[32]  Stephen N. Freund,et al.  FastTrack: efficient and precise dynamic race detection , 2009, PLDI '09.

[33]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[34]  Oksana Tkachuk,et al.  Combining environment generation and slicing for modular software model checking , 2007, ASE '07.

[35]  Matthew B. Dwyer,et al.  Controlling factors in evaluating path-sensitive error detection techniques , 2006, SIGSOFT '06/FSE-14.

[36]  Yuanyuan Zhou,et al.  AVIO: Detecting Atomicity Violations via Access-Interleaving Invariants , 2007, IEEE Micro.

[37]  Satish Narayanasamy,et al.  Automatically classifying benign and harmful data races using replay analysis , 2007, PLDI '07.

[38]  Yuanyuan Zhou,et al.  CTrigger: exposing atomicity violation bugs from their hiding places , 2009, ASPLOS.

[39]  Madan Musuvathi,et al.  Iterative context bounding for systematic testing of multithreaded programs , 2007, PLDI '07.

[40]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[41]  Satish Narayanasamy,et al.  LiteRace: effective sampling for lightweight data-race detection , 2009, PLDI '09.

[42]  Yuanyuan Zhou,et al.  PRES: probabilistic replay with execution sketching on multiprocessors , 2009, SOSP '09.

[43]  Michael D. Ernst,et al.  Combined static and dynamic automated test generation , 2011, ISSTA '11.

[44]  Sebastian Burckhardt,et al.  Preemption Sealing for Efficient Concurrency Testing , 2010, TACAS.

[45]  Madan Musuvathi,et al.  Fair stateless model checking , 2008, PLDI '08.

[46]  Tim Menzies,et al.  Genetic Algorithms for Randomized Unit Testing , 2011, IEEE Transactions on Software Engineering.

[47]  Carl K. Chang,et al.  OCAT: object capture-based automated testing , 2010, ISSTA '10.