RADISH: always-on sound and complete Ra D etection i n S oftware and H ardware

Data-race freedom is a valuable safety property for multithreaded programs that helps with catching bugs, simplifying memory consistency model semantics, and verifying and enforcing both atomicity and determinism. Unfortunately, existing software-only dynamic race detectors are precise but slow; proposals with hardware support offer higher performance but are imprecise. Both precision and performance are necessary to achieve the many advantages always-on dynamic race detection could provide. To resolve this trade-off, we propose Radish, a hybrid hardware-software dynamic race detector that is always-on and fully precise. In Radish, hardware caches a principled subset of the metadata necessary for race detection; this subset allows the vast majority of race checks to occur completely in hardware. A flexible software layer handles persistence of race detection metadata on cache evictions and occasional queries to this expanded set of metadata. We show that Radish is correct by proving equivalence to a conventional happens-before race detector. Our design has modest hardware complexity: caches are completely unmodified and we piggy-back on existing coherence messages but do not otherwise modify the protocol. Furthermore, Radish can leverage type-safe languages to reduce overheads substantially. Our evaluation of a simulated 8-core Radish processor using PARSEC benchmarks shows runtime overheads from negligible to 2x, outperforming the leading software-only race detector by 2x-37x.

[1]  Jong-Deok Choi,et al.  An efficient cache-based access anomaly detection scheme , 1991, ASPLOS IV.

[2]  Stephen N. Freund,et al.  Velodrome: a sound and complete dynamic atomicity checker for multithreaded programs , 2008, PLDI '08.

[3]  Milos Prvulovic,et al.  CORD: cost-effective (and nearly overhead-free) order-recording and data race detection , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[4]  Marek Olszewski,et al.  Aikido: accelerating shared data dynamic analyses , 2012, ASPLOS XVII.

[5]  Jeremy Manson,et al.  The Java memory model , 2005, POPL '05.

[6]  Stephen N. Freund,et al.  SingleTrack: A Dynamic Determinism Checker for Multithreaded Programs , 2009, ESOP.

[7]  David Aspinall,et al.  On Validity of Program Transformations in the Java Memory Model , 2008, ECOOP.

[8]  Michael A. Bender,et al.  On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs , 2004, SPAA '04.

[9]  Milo M. K. Martin,et al.  Hardbound: architectural support for spatial safety of the C programming language , 2008, ASPLOS.

[10]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  Pravesh Kothari,et al.  A randomized scheduler with probabilistic guarantees of finding bugs , 2010, ASPLOS 2010.

[12]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[13]  Serdar Tasiran,et al.  Goldilocks: a race and transaction-aware java runtime , 2007, PLDI '07.

[14]  John M. Mellor-Crummey,et al.  On-the-fly detection of data races for programs with nested fork-join parallelism , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[15]  Koen De Bosschere,et al.  Accordion Clocks: Logical Clocks for Data Race Detection , 2001, Euro-Par.

[16]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[17]  Hans-Juergen Boehm,et al.  Extended sequential reasoning for data-race-free programs , 2011, MSPC '11.

[18]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[19]  Dawson R. Engler,et al.  RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.

[20]  Assaf Schuster,et al.  Efficient on-the-fly data race detection in multithreaded C++ programs , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[21]  Stephen N. Freund,et al.  FastTrack: efficient and precise dynamic race detection , 2009, PLDI '09.

[22]  Friedemann Mattern,et al.  Virtual Time and Global States of Distributed Systems , 2002 .

[23]  Satish Narayanasamy,et al.  DRFX: a simple and efficient memory model for concurrent programming languages , 2010, PLDI '10.

[24]  Rajiv Gupta,et al.  ECMon: exposing cache events for monitoring , 2009, ISCA '09.

[25]  Jeffrey Overbey,et al.  A type and effect system for deterministic parallel Java , 2009, OOPSLA 2009.

[26]  Colin J. Fidge,et al.  Logical time in distributed computing systems , 1991, Computer.

[27]  Thomas Ball,et al.  Finding and Reproducing Heisenbugs in Concurrent Programs , 2008, OSDI.

[28]  Pin Zhou,et al.  HARD: Hardware-Assisted Lockset-based Race Detection , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[29]  Brandon Lucia,et al.  Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races , 2010, ISCA.

[30]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[31]  Josep Torrellas,et al.  ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes , 2003, ISCA '03.

[32]  Josep Torrellas,et al.  SigRace: signature-based data race detection , 2009, ISCA '09.

[33]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[34]  Maurice Herlihy,et al.  Virtualizing Transactional Memory , 2005, ISCA 2005.

[35]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[36]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language , 1992 .