A case for an SC-preserving compiler

The most intuitive memory consistency model for shared-memory multi-threaded programming is sequential consistency (SC). However, current concurrent programming languages support a relaxed model, as such relaxations are deemed necessary for enabling important optimizations. This paper demonstrates that an SC-preserving compiler, one that ensures that every SC behavior of a compiler-generated binary is an SC behavior of the source program, retains most of the performance benefits of an optimizing compiler. The key observation is that a large class of optimizations crucial for performance are either already SC-preserving or can be modified to preserve SC while retaining much of their effectiveness. An SC-preserving compiler, obtained by restricting the optimization phases in LLVM, a state-of-the-art C/C++ compiler, incurs an average slowdown of 3.8% and a maximum slowdown of 34% on a set of 30 programs from the SPLASH-2, PARSEC, and SPEC CINT2006 benchmark suites. While the performance overhead of preserving SC in the compiler is much less than previously assumed, it might still be unacceptable for certain applications. We believe there are several avenues for improving performance without giving up SC-preservation. In this vein, we observe that the overhead of our SC-preserving compiler arises mainly from its inability to aggressively perform a class of optimizations we identify as eager-load optimizations. This class includes common-subexpression elimination, constant propagation, global value numbering, and common cases of loop-invariant code motion. We propose a notion of interference checks in order to enable eager-load optimizations while preserving SC. Interference checks expose to the compiler a commonly used hardware speculation mechanism that can efficiently detect whether a particular variable has changed its value since last read.

[1]  Peter Sewell,et al.  A Better x86 Memory Model: x86-TSO , 2009, TPHOLs.

[2]  Karin Strauss,et al.  Conflict Exceptions: Providing Simple Parallel Language Semantics with Precise Hardware Exceptions , 2009 .

[3]  Scott A. Mahlke,et al.  Dynamic memory disambiguation using the memory conflict buffer , 1994, ASPLOS VI.

[4]  Thomas F. Wenisch,et al.  InvisiFence: performance-transparent memory ordering in conventional multiprocessors , 2009, ISCA '09.

[5]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[6]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[7]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[8]  Rajiv Gupta,et al.  Speculative Optimizations for Parallel Programs on Multicores , 2009, LCPC.

[9]  Rajiv Gupta,et al.  Efficient sequential consistency using conditional fences , 2010, PACT '10.

[10]  Trevor N. Mudge,et al.  The store-load address table and speculative register promotion , 2000, MICRO 33.

[11]  Stephen N. Freund,et al.  Type-based race detection for Java , 2000, PLDI '00.

[12]  Jeffrey S. Foster,et al.  LOCKSMITH: context-sensitive correlation analysis for race detection , 2006, PLDI '06.

[13]  Anoop Gupta,et al.  Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.

[14]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[15]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[16]  Martin C. Rinard,et al.  A parameterized type system for race-free Java programs , 2001, OOPSLA '01.

[17]  Peter Sewell,et al.  Mathematizing C++ concurrency , 2011, POPL '11.

[18]  Stephen N. Freund,et al.  FastTrack: efficient and precise dynamic race detection , 2009, PLDI '09.

[19]  Brandon Lucia,et al.  Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races , 2010, ISCA.

[20]  Mark D. Hill,et al.  Weak ordering—a new definition , 1998, ISCA '98.

[21]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[22]  Sarita V. Adve,et al.  Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models , 1997, SPAA '97.

[23]  Sebastian Burckhardt,et al.  Verifying Local Transformations on Relaxed Memory Models , 2010, CC.

[24]  Martin C. Rinard,et al.  ACM Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA), November 2002 Ownership Types for Safe Programming: Preventing Data Races and Deadlocks , 2022 .

[25]  Mark D. Hill,et al.  Multiprocessors Should Support Simple Memory- Consistency Models Uniprocessor Memory , 1998 .

[26]  Jeremy Manson,et al.  The Java memory model , 2005, POPL '05.

[27]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[28]  Josep Torrellas,et al.  BulkCompiler: High-performance Sequential Consistency through cooperative compiler and hardware support , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Serdar Tasiran,et al.  Goldilocks: a race and transaction-aware java runtime , 2007, PLDI '07.

[30]  Robert H. B. Netzer,et al.  Detecting data races on weak memory systems , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[31]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[32]  Dennis Shasha,et al.  Efficient and correct execution of parallel programs that share memory , 1988, TOPL.

[33]  Suresh Jagannathan,et al.  Relaxed-memory concurrency and verified compilation , 2011, POPL '11.

[34]  Satish Narayanasamy,et al.  Unbounded page-based transactional memory , 2006, ASPLOS XII.

[35]  Brandon Lucia,et al.  A case for system support for concurrency exceptions , 2009 .

[36]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[37]  Josep Torrellas,et al.  SigRace: signature-based data race detection , 2009, ISCA '09.

[38]  David A. Padua,et al.  Compiler techniques for high performance sequentially consistent java programs , 2005, PPOPP.

[39]  Satish Narayanasamy,et al.  DRFX: a simple and efficient memory model for concurrent programming languages , 2010, PLDI '10.

[40]  Hans-J. Boehm Simple thread semantics require race detection , 2009 .

[41]  J. Torrellas,et al.  ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[42]  Jimmy Su,et al.  Making Sequential Consistency Practical in Titanium , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[43]  Babak Falsafi,et al.  Speculative sequential consistency with little custom storage , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[44]  Jeffrey Overbey,et al.  A type and effect system for deterministic parallel Java , 2009, OOPSLA 2009.

[45]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[46]  David Aspinall,et al.  On Validity of Program Transformations in the Java Memory Model , 2008, ECOOP.

[47]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[48]  Josep Torrellas,et al.  BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.