Mixed-size concurrency: ARM, POWER, C/C++11, and SC

Previous work on the semantics of relaxed shared-memory concurrency has only considered the case in which each load reads the data of exactly one store. In practice, however, multiprocessors support mixed-size accesses, and these are used by systems software and (to some degree) exposed at the C/C++ language level. A semantic foundation for software, therefore, has to address them. We investigate the mixed-size behaviour of ARMv8 and IBM POWER architectures and implementations: by experiment, by developing semantic models, by testing the correspondence between these, and by discussion with ARM and IBM staff. This turns out to be surprisingly subtle, and on the way we have to revisit the fundamental concepts of coherence and sequential consistency, which change in this setting. In particular, we show that adding a memory barrier between each instruction does not restore sequential consistency. We go on to extend the C/C++11 model to support non-atomic mixed-size memory accesses. This is a necessary step towards semantics for real-world shared-memory concurrent code, beyond litmus tests.

[1]  Michel Cekleov,et al.  Formal Specification of Memory Models , 1992 .

[2]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[3]  Jalal Kawash,et al.  Weak Memory Consistency Models , 1998 .

[4]  Francesco Zappa Nardelli,et al.  The semantics of power and ARM multiprocessor machine code (abstract only) , 2009, SIGP.

[5]  Ganesh Gopalakrishnan,et al.  Towards a formal model of shared memory consistency for Intel Itanium/sup TM/ , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[6]  Jeehoon Kang,et al.  Repairing sequential consistency in C/C++11 , 2017, PLDI.

[7]  David L Weaver,et al.  The SPARC architecture manual : version 9 , 1994 .

[8]  Jade Alglave,et al.  Litmus: Running Tests against Hardware , 2011, TACAS.

[9]  Jalal Kawash,et al.  Programmer-Centric Conditions for Itanium Memory Consistency , 2006, ICDCN.

[10]  Jade Alglave,et al.  Fences in Weak Memory Models , 2010, CAV.

[11]  Sizhuo Zhang,et al.  Taming Weak Memory Models , 2016, ArXiv.

[12]  Peter Sewell,et al.  Mathematizing C++ concurrency , 2011, POPL '11.

[13]  Ali Sezgin,et al.  Modelling the ARMv8 architecture, operationally: concurrency and ISA , 2016, POPL.

[14]  Peter Kulchyski and , 2015 .

[15]  Viktor Vafeiadis,et al.  GPS: navigating weak memory with ghosts, protocols, and separation , 2014, OOPSLA.

[16]  Alexey Gotsman,et al.  Library abstraction for C/C++ concurrency , 2013, POPL.

[17]  Francesco Zappa Nardelli,et al.  x86-TSO , 2010, Commun. ACM.

[18]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[19]  Gabriel Kerneis,et al.  An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Tom Ridge,et al.  Lem: reusable engineering of real-world semantics , 2014, ICFP.

[21]  David L. Dill,et al.  Formal specification of abstract memory models , 1993 .

[22]  Jade Alglave,et al.  Understanding POWER multiprocessors , 2011, PLDI '11.

[23]  Mark John Batty,et al.  The C11 and C++11 concurrency model , 2015 .

[24]  Peter Sewell,et al.  Clarifying and compiling C/C++ concurrency: from C++11 to POWER , 2012, POPL '12.

[25]  Alexander Knapp,et al.  The Java Memory Model: Operationally, Denotationally, Axiomatically , 2007, ESOP.

[26]  Peter Sewell,et al.  A Better x86 Memory Model: x86-TSO , 2009, TPHOLs.

[27]  Tom Ridge,et al.  The semantics of x86-CC multiprocessor machine code , 2009, POPL '09.

[28]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[29]  David Aspinall,et al.  On Validity of Program Transformations in the Java Memory Model , 2008, ECOOP.

[30]  Peter Sewell,et al.  The Problem of Programming Language Concurrency Semantics , 2015, ESOP.

[31]  Michel Dubois,et al.  Memory access buffering in multiprocessors , 1998, ISCA '98.

[32]  Francesco Zappa Nardelli,et al.  The semantics of power and ARM multiprocessor machine code , 2009, DAMP '09.

[33]  Roy Friedman,et al.  Programming DEC-Alpha based multiprocessors the easy way (extended abstract) , 1994, SPAA '94.

[34]  Jens Brandt,et al.  Theorem Proving in Higher Order Logics , 1997, Lecture Notes in Computer Science.

[35]  Arvind,et al.  Memory Model = Instruction Reordering + Store Atomicity , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[36]  Wenhao Yu,et al.  Supplementary material , 2015 .

[37]  Jayadev Misra Axioms for memory access in asynchronous hardware systems , 1986, TOPL.

[38]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[39]  Jade Alglave,et al.  Synchronising C/C++ and POWER , 2012, PLDI.

[40]  Peter Sewell,et al.  A concurrency semantics for relaxed atomics that permits optimisation and avoids thin-air executions , 2016, POPL.

[41]  K. Gharachodoo,et al.  Memory consistency models for shared memory multiprocessors , 1996 .

[42]  Allon Adir,et al.  Information-Flow Models for Shared Memory with an Application to the PowerPC Architecture , 2003, IEEE Trans. Parallel Distributed Syst..

[43]  Yue Yang,et al.  Nemos: a framework for axiomatic and executable specifications of memory consistency models , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[44]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[45]  James R. Goodman,et al.  Cache Consistency and Sequential Consistency , 1991 .

[46]  Francisco Corella,et al.  Specification of the powerpc shared memory architecture , 1993 .

[47]  Margaret Martonosi,et al.  Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7 Trailing-Sync Compiler Mappings , 2016, ArXiv.

[48]  Vicente Cholvi-Juan Formalizing Memory Coherency Models , 1994 .

[49]  Dennis Shasha,et al.  Efficient and correct execution of parallel programs that share memory , 1988, TOPL.

[50]  Gil Neiger,et al.  A Characterization of Scalable Shared Memories , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[51]  Jalal Kawash,et al.  WEAK MEMORY CONSISTENCY MODELS PART ONE: DEFINITIONS AND COMPARISONS , 1998 .

[52]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[53]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[54]  Samin Ishtiaq,et al.  Reasoning about the ARM weakly consistent memory model , 2008, MSPC '08.

[55]  William W. Collier,et al.  Reasoning about parallel architectures , 1992 .

[56]  Richard Bornat,et al.  New Lace and Arsenic: adventures in weak memory with a program logic , 2015, ArXiv.