Clash of the Titans: tools and techniques for hunting bugs in concurrent programs

In this work we focus on creating a benchmark suite of concurrent programs for various programming languages to evaluate the bug detection capabilities of various tools and techniques. We have compiled a set of Java benchmarks from various sources and our own efforts. For many of the Java examples we have created equivalent C# programs. All the benchmarks are available for download. In our multi-language benchmark suite we compare results from various tools: CalFuzzer, ConTest, CHESS, and Java Pathfinder. In Java Pathfinder we provide extensive results for state-less random walk, randomized depth-first search, and guided search using abstraction refinement. Using data from our study we argue that iterative context-bounding and dynamic partial order reduction are not sufficient to render model checking for testing concurrent programs tractable and secondary techniques such as guidance strategies are required. As part of this work, we have also created a wiki to publish benchmark details and data from various tools on those benchmarks to a broader research community.

[1]  Scott D. Stoller,et al.  Testing Concurrent Java Programs using Randomized Scheduling , 2002, RV@FLoC.

[2]  Eric Mercer,et al.  Generating Counter-Examples Through Randomized Guided Search , 2007, SPIN.

[3]  Patrice Godefroid,et al.  Dynamic partial-order reduction for model checking software , 2005, POPL '05.

[4]  Koushik Sen,et al.  A Race-Detection and Flipping Algorithm for Automated Testing of Multi-threaded Programs , 2006, Haifa Verification Conference.

[5]  Patrik Haslum Model Checking by Random Walk , 1999 .

[6]  Klaus Havelund,et al.  Using Runtime Analysis to Guide Model Checking of Java Programs , 2013, SPIN.

[7]  Matthew B. Dwyer,et al.  Controlling factors in evaluating path-sensitive error detection techniques , 2006, SIGSOFT '06/FSE-14.

[8]  Klaus Havelund,et al.  Towards a framework and a benchmark for testing tools for multi‐threaded programs , 2007, Concurr. Comput. Pract. Exp..

[9]  Koushik Sen,et al.  Race directed random testing of concurrent programs , 2008, PLDI '08.

[10]  Eric Mercer,et al.  A Meta Heuristic for Effectively Detecting Concurrency Errors , 2008, Haifa Verification Conference.

[11]  Gerard J. Holzmann,et al.  The SPIN Model Checker - primer and reference manual , 2003 .

[12]  Klaus Havelund,et al.  Towards a framework and a benchmark for testing tools for multi-threaded programs: Research Articles , 2007 .

[13]  Radek Pelánek,et al.  BEEM: Benchmarks for Explicit Model Checkers , 2007, SPIN.

[14]  Gerard J. Holzmann,et al.  The SPIN Model Checker , 2003 .

[15]  Willem Visser,et al.  Efficient Testing of Concurrent Programs with Abstraction-Guided Symbolic Execution , 2009, SPIN.

[16]  Eric Mercer,et al.  Guided model checking for programs with polymorphism , 2009, PEPM '09.

[17]  Madan Musuvathi,et al.  Fair stateless model checking , 2008, PLDI '08.

[18]  Matthew B. Dwyer,et al.  Parallel Randomized State-Space Search , 2007, 29th International Conference on Software Engineering (ICSE'07).

[19]  Dawson R. Engler,et al.  RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.

[20]  Eric Mercer,et al.  Hardness for Explicit State Software Model Checking Benchmarks , 2007, Fifth IEEE International Conference on Software Engineering and Formal Methods (SEFM 2007).

[21]  Madan Musuvathi,et al.  Iterative context bounding for systematic testing of multithreaded programs , 2007, PLDI '07.

[22]  Shmuel Ur,et al.  Compiling a benchmark of documented multi-threaded bugs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[23]  Klaus Havelund,et al.  Model checking programs , 2000, Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering.