What is the cost of weak determinism?

We analyze the fundamental performance impact of enforcing a fixed order of synchronization operations to achieve weak deterministic execution. Our analysis is in three parts, performed on a real system using the SPLASH-2 and PAR-SEC benchmarks. First, we quantify the impact of various sources of nondeterminism on execution of data-race-free programs. We find that thread synchronization is the prevalent source of nondeterminism, sometimes affecting program output. Second, we divorce the implementation overhead of a system imposing a specific synchronization order from the impact of enforcing this order. We show that this fundamental cost of determinism is small (slowdown of 4% on average and 32% in the worst case) and we identify application characteristics responsible for this cost. Finally, we evaluate this cost under perturbed execution conditions. We find that demanding determinism when threads face such conditions can cause almost 2× slowdown.

[1]  Junfeng Yang,et al.  Parrot: a practical runtime for deterministic, stable, and reliable threads , 2013, SOSP.

[2]  Sen Hu,et al.  Efficient system-enforced deterministic parallelism , 2010, OSDI.

[3]  David A. Padua,et al.  Automatic detection of nondeterminacy in parallel programs , 1988, PADD '88.

[4]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[5]  L. Ceze,et al.  The Deterministic Execution Hammer : How Well Does it Actually Pound Nails ? , 2011 .

[6]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[7]  Shirley Moore,et al.  Non-determinism and overcount on modern hardware performance counter implementations , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[8]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[9]  Sally A. McKee,et al.  Can hardware performance counters be trusted? , 2008, 2008 IEEE International Symposium on Workload Characterization.

[10]  Aart J. C. Bik,et al.  Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.

[11]  References , 1971 .

[12]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[13]  Samuel T. King,et al.  HARDWARE AND SOFTWARE APPROACHES FOR DETERMINISTIC MULTI-PROCESSOR REPLAY OF CONCURRENT PROGRAMS , 2009 .

[14]  David A. Wood,et al.  Calvin: Deterministic or not? Free will to choose , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[15]  Michael L. Scott,et al.  Toward a Formal Semantic Framework for Deterministic Parallel Programming , 2011, DISC.

[16]  Barton P. Miller,et al.  What are race conditions?: Some issues and formalizations , 1992, LOPL.

[17]  Jakob Eriksson,et al.  Conversion: multi-version concurrency control for main memory segments , 2013, EuroSys '13.

[18]  P MillerBarton,et al.  What are race conditions , 1992 .

[19]  Björn B. Brandenburg,et al.  Cache-Related Preemption and Migration Delays : Empirical Approximation and Impact on Schedulability ∗ , 2010 .

[20]  Luis Ceze,et al.  Deterministic Process Groups in dOS , 2010, OSDI.

[21]  Ulrich Drepper,et al.  How To Write Shared Libraries , 2005 .

[22]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[23]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[24]  Kai Lu,et al.  Efficient deterministic multithreading without global barriers , 2014, PPoPP '14.

[25]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[26]  Emery D. Berger,et al.  Dthreads: efficient deterministic multithreading , 2011, SOSP.

[27]  Brandon Lucia,et al.  DMP: Deterministic Shared-Memory Multiprocessing , 2010, IEEE Micro.

[28]  Dan Grossman,et al.  RCDC: a relaxed consistency deterministic computer , 2011, ASPLOS XVI.

[29]  Hans-J. Boehm Position paper: nondeterminism is unavoidable, but data races are pure evil , 2012, RACES '12.

[30]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .

[31]  Dan Grossman,et al.  CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS XV.

[32]  David A. Wood,et al.  Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[33]  Konstantin Serebryany,et al.  ThreadSanitizer: data race detection in practice , 2009, WBIA '09.

[34]  Emery D. Berger,et al.  Grace: safe multithreaded programming for C/C++ , 2009, OOPSLA 2009.

[35]  James Cownie,et al.  PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs , 2010, CGO '10.

[36]  Keshav Pingali,et al.  Deterministic galois: on-demand, portable and parameterless , 2014, ASPLOS.

[37]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[38]  Guy E. Blelloch,et al.  Internally deterministic parallel algorithms can be fast , 2012, PPoPP '12.

[39]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[40]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .