Finding concurrency bugs with context-aware communication graphs

Incorrect thread synchronization often leads to concurrency bugs that manifest nondeterministically and are difficult to detect and fix. Past work on detecting concurrency bugs has addressed the general problem in an ad-hoc fashion, focusing mostly on data races and atomicity violations. Using graphs to represent a multithreaded program execution is very natural, nodes represent static instructions and edges represent communication via shared memory. In this paper we make the fundamental observation that such basic context-oblivious graphs do not encode enough information to enable accurate bug detection. We propose context-aware communication graphs, a new kind of communication graph that encodes global ordering information by embedding communication contexts. We then build Bugaboo, a simple and generic framework that accurately detects complex concurrency bugs. Our framework collects communication graphs from multiple executions and uses invariant-based techniques to detect anomalies in the graphs. We built two versions of Bugaboo: BB-SW, which is fully implemented in software but suffers from significant slowdowns; and BB-HW, which relies on custom architecture support but has negligible performance degradation. BB-HW requires modest extensions to a commodity multicore processor and can be used in deployment settings. We evaluate both versions using applications such as MySQL, Apache, PARSEC, and several others. Our results show that Bugaboo identifies a wide variety of concurrency bugs, including challenging multivariable bugs, with few (often zero) unnecessary code inspections.

[1]  Sudheendra Hangal,et al.  Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[2]  Pin Zhou,et al.  HARD: Hardware-Assisted Lockset-based Race Detection , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[3]  Brandon Lucia,et al.  Atom-Aid: Detecting and Surviving Atomicity Violations , 2009, IEEE Micro.

[4]  Alex Aiken,et al.  Cooperative Bug Isolation , 2007 .

[5]  Yuanyuan Zhou,et al.  AVIO: Detecting Atomicity Violations via Access-Interleaving Invariants , 2007, IEEE Micro.

[6]  Yuanyuan Zhou,et al.  CTrigger: exposing atomicity violation bugs from their hiding places , 2009, ASPLOS.

[7]  Satish Narayanasamy,et al.  A case for an interleaving constrained shared-memory multi-processor , 2009, ISCA '09.

[8]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multi-threaded programs , 1997, TOCS.

[9]  Thomas Ball,et al.  Finding and Reproducing Heisenbugs in Concurrent Programs , 2008, OSDI.

[10]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[11]  M. Lam,et al.  Tracking down software bugs using automatic anomaly detection , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[12]  William G. Griswold,et al.  Quickly detecting relevant program invariants , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[13]  Dhabaleswar K. Panda,et al.  DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[14]  Koen De Bosschere,et al.  RecPlay: a fully integrated practical record/replay system , 1999, TOCS.

[15]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Josep Torrellas,et al.  ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes , 2003, ISCA '03.

[17]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[18]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[19]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[20]  Cormac Flanagan,et al.  A type and effect system for atomicity , 2003, PLDI.

[21]  Min Xu,et al.  A serializability violation detector for shared-memory server programs , 2005, PLDI '05.

[22]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.