CORD: cost-effective (and nearly overhead-free) order-recording and data race detection

Chip-multiprocessors are becoming the dominant vehicle for general-purpose processing, and parallel software will be needed to effectively utilize them. This parallel software is notoriously prone to synchronization bugs, which are often difficult to detect and repeat for debugging. While data race detection and order-recording for deterministic replay are useful in debugging such problems, only order-recording schemes are lightweight, whereas data race detection support scales poorly and degrades performance significantly. This paper presents our CORD (cost-effective order-recording and data race detection) mechanism. It is similar in cost to prior order-recording mechanisms, but costs considerably less then prior schemes for data race detection. CORD also has a negligible performance overhead (0.4% on average) and detects most dynamic manifestations of synchronization problems (77% on average). Overall, CORD is fast enough to run always (even in performance-sensitive production runs) and provides the support programmers need to deal with the complexities of writing, debugging, and maintaining parallel software for future multi-threaded and multi-core machines.

[1]  Colin J. Fidge,et al.  Logical time in distributed computing systems , 1991, Computer.

[2]  Koen De Bosschere,et al.  Clock snooping and its application in on-the-fly data race detection , 1997, Proceedings of the 1997 International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN'97).

[3]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multi-threaded programs , 1997, TOCS.

[4]  James R. Larus,et al.  Protocol-based data-race detection , 1998, SPDT '98.

[5]  Min Xu,et al.  A serializability violation detector for shared-memory server programs , 2005, PLDI '05.

[6]  Koen De Bosschere,et al.  RecPlay: a fully integrated practical record/replay system , 1999, TOCS.

[7]  Jong-Deok Choi,et al.  Race Frontier: reproducing data races in parallel-program debugging , 1991, PPOPP '91.

[8]  Luk Levrouw,et al.  Space efficient data race detection for parallel programs with series-parallel task graphs , 1995, Proceedings Euromicro Workshop on Parallel and Distributed Processing.

[9]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[10]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[11]  Wi N Dows FLIGHT DATA RECORDER FOR , 2007 .

[12]  Barton P. Miller,et al.  Detecting Data Races on Weak Memory Systems , 1991, ISCA.

[13]  Edith Schonberg,et al.  An empirical comparison of monitoring algorithms for access anomaly detection , 2011, PPOPP '90.

[14]  Barton P. Miller,et al.  Improving the accuracy of data race detection , 1991, PPOPP '91.

[15]  Dennis Shasha,et al.  Efficient and correct execution of parallel programs that share memory , 1988, TOPL.

[16]  Jong-Deok Choi,et al.  An efficient cache-based access anomaly detection scheme , 1991, ASPLOS IV.

[17]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[18]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[19]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[20]  Ken Kennedy,et al.  Parallel program debugging with on-the-fly anomaly detection , 1990, Proceedings SUPERCOMPUTING '90.

[21]  Barton P. Miller,et al.  What are race conditions?: Some issues and formalizations , 1992, LOPL.

[22]  Celine Valot,et al.  Characterizing the accuracy of distributed timestamps , 1993, PADD '93.

[23]  Ken Kennedy,et al.  Parascope:a Parallel Programming Environment , 1988 .

[24]  Jong-Deok Choi,et al.  Efficient and precise datarace detection for multithreaded object-oriented programs , 2002, PLDI '02.

[25]  Josep Torrellas,et al.  ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors , 2002, ISCA.

[26]  Peter J. Keleher,et al.  A Protocol-Centric Approach to on-the-Fly Race Detection , 2000, IEEE Trans. Parallel Distributed Syst..

[27]  Satish Narayanasamy,et al.  BugNet: continuously recording program execution for deterministic replay debugging , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[28]  Josep Torrellas,et al.  ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes , 2003, ISCA '03.

[29]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.