Toward Integration of Data Race Detection in DSM Systems

We present a distributed algorithm, called djit, for detecting data races in dsm systems. djit is designed as a dsm add-on, detecting a race condition as soon as one is created. It instantly displays to the user the precise place in the program where the race occurred. There are no false detections, and no data races are missed. We have implemented djit on top of millipage?a fine granularity, page-based dsm system. Our implementation makes novel use of the operating system protection mechanisms. In particular, we propose a protection cache, which can be used for local logging of accesses to variables. As a result, our implementation does not increase the message complexity of the execution, piggybacking all its communication activity on top of the dsm-related messages. The performance figures show that our data race detection mechanism has only a minor influence on performance. The measured overheads, averaging only few percent, are two orders of magnitude smaller than those achieved in previous work. Thus, our technique makes the integration of on-the-fly data race detection during the regular dsm execution feasible for the first time.

[1]  Larry Rudolph,et al.  ParC—An Extension of C for Shared Memory Parallel Processing , 1996 .

[2]  Ken Kennedy,et al.  Parallel program debugging with on-the-fly anomaly detection , 1990, Proceedings SUPERCOMPUTING '90.

[3]  Scott Pakin,et al.  Fast messages: efficient, portable communication for workstation clusters and MPPs , 1997, IEEE Concurrency.

[4]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[5]  所 真理雄,et al.  20th ACM Symposium on Operating Systems Principles , 1986, SOSP '05.

[6]  Barton P. Miller,et al.  Detecting data races on weak memory systems , 1991, ISCA '91.

[7]  Assaf Schuster,et al.  MultiView and Millipage — fine-grain sharing in page-based DSMs , 1999, OSDI '99.

[8]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[9]  Mateo Valero,et al.  Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  Peter J. Keleher,et al.  Online data-race detection via coherency guarantees , 1996, OSDI '96.

[11]  Assaf Schuster,et al.  Dynamic adaptation of sharing granularity in DSM systems , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[12]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[13]  John M. Mellor-Crummey,et al.  On-the-fly detection of data races for programs with nested fork-join parallelism , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[14]  Jong-Deok Choi,et al.  An efficient cache-based access anomaly detection scheme , 1991, ASPLOS IV.

[15]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[16]  Barton P. Miller,et al.  What are race conditions?: Some issues and formalizations , 1992, LOPL.

[17]  Edith Schonberg,et al.  An empirical comparison of monitoring algorithms for access anomaly detection , 2011, PPOPP '90.

[18]  James R. Larus,et al.  Protocol-based data-race detection , 1998, SPDT '98.

[19]  Robert H. B. Netzer,et al.  Pace condition detection for debugging shared-memory parallel programs , 1992 .

[20]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.