RACEZ: a lightweight and non-invasive race detection tool for production applications

Concurrency bugs, particularly data races, are notoriously difficult to debug and are a significant source of unreliability in multithreaded applications. Many tools to catch data races rely on program instrumentation to obtain memory instruction traces. Unfortunately, this instrumentation introduces significant runtime overhead, is extremely invasive, or has a limited domain of applicability making these tools unsuitable for many production systems. Consequently, these tools are typically used during application testing where many data races go undetected. This paper proposes RACEZ , a novel race detection mechanism which uses a sampled memory trace collected by the hardware performance monitoring unit rather than invasive instrumentation. The approach introduces only a modest overhead making it usable in production environments. We validate RACEZ using two open source server applications and the PARSEC benchmarks. Our experiments show that RACEZ catches a set of known bugs with reasonable probability while introducing only 2.8% runtime slow down on average.

[1]  Vijay Janapa Reddi,et al.  Code coverage testing using hardware performance monitoring support , 2005, AADEBUG'05.

[2]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[3]  Pravesh Kothari,et al.  A randomized scheduler with probabilistic guarantees of finding bugs , 2010, ASPLOS XV.

[4]  Horatiu Jula,et al.  Deadlock Immunity: Enabling Systems to Defend Against Deadlocks , 2008, OSDI.

[5]  Yuanyuan Zhou,et al.  AVIO: Detecting Atomicity Violations via Access-Interleaving Invariants , 2007, IEEE Micro.

[6]  Yuanyuan Zhou,et al.  Rx: treating bugs as allergies---a safe method to survive software failures , 2005, SOSP '05.

[7]  Josep Torrellas,et al.  Accurate and efficient filtering for the Intel thread checker race detector , 2006, ASID '06.

[8]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[9]  Jong-Deok Choi,et al.  Hybrid dynamic data race detection , 2003, PPoPP '03.

[10]  Yuanyuan Zhou,et al.  CTrigger: exposing atomicity violation bugs from their hiding places , 2009, ASPLOS.

[11]  Dawson R. Engler,et al.  RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.

[12]  Assaf Schuster,et al.  MultiRace: efficient on‐the‐fly data race detection in multithreaded C++ programs , 2007, Concurr. Comput. Pract. Exp..

[13]  Junfeng Yang,et al.  Bypassing Races in Live Applications with Execution Filters , 2010, OSDI.

[14]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[15]  Konstantin Serebryany,et al.  ThreadSanitizer: data race detection in practice , 2009, WBIA '09.

[16]  Sorin Lerner,et al.  RELAY: static race detection on millions of lines of code , 2007, ESEC-FSE '07.

[17]  Stephen N. Freund,et al.  FastTrack: efficient and precise dynamic race detection , 2009, PLDI '09.

[18]  Michael D. Bond,et al.  PACER: proportional detection of data races , 2010, PLDI '10.

[19]  Easwaran Raman,et al.  MAO — An extensible micro-architectural optimizer , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[20]  Satish Narayanasamy,et al.  LiteRace: effective sampling for lightweight data-race detection , 2009, PLDI '09.

[21]  Josep Torrellas,et al.  SigRace: signature-based data race detection , 2009, ISCA '09.

[22]  Shan Lu,et al.  ConMem: detecting severe concurrency bugs through an effect-oriented approach , 2010, ASPLOS XV.

[23]  Pin Zhou,et al.  HARD: Hardware-Assisted Lockset-based Race Detection , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[24]  Koushik Sen,et al.  Race directed random testing of concurrent programs , 2008, PLDI '08.

[25]  Thomas Ball,et al.  Finding and Reproducing Heisenbugs in Concurrent Programs , 2008, OSDI.

[26]  Wenguang Chen,et al.  Taming hardware event samples for FDO compilation , 2010, CGO '10.