COATCheck: Verifying Memory Ordering at the Hardware-OS Interface

Modern computer systems include numerous compute elements, from CPUs to GPUs to accelerators. Harnessing their full potential requires well-defined, properly-implemented memory consistency models (MCMs), and low-level system functionality such as virtual memory and address translation (AT). Unfortunately, it is difficult to specify and implement hardware-OS interactions correctly; in the past, many hardware and OS specification mismatches have resulted in implementation bugs in commercial processors. In an effort to resolve this verification gap, this paper makes the following contributions. First, we present COATCheck, an address translation-aware framework for specifying and statically verifying memory ordering enforcement at the microarchitecture and operating system levels. We develop a domain-specific language for specifying ordering enforcement, for including ordering-related OS events and hardware micro-operations, and for programmatically enumerating happens-before graphs. Using a fast and automated static constraint solver, COATCheck can efficiently analyze interesting and important memory ordering scenarios for modern, high-performance, out-of-order processors. Second, we show that previous work on Virtual Address Memory Consistency (VAMC) does not capture every translation-related ordering scenario of interest, and that some such cases even fall outside the traditional scope of consistency. We therefore introduce the term transistency model to describe the superset of consistency which captures all translation-aware sets of ordering rules.

[1]  Albert Cohen,et al.  Correct and Efficient Bounded FIFO Queues , 2013, 2013 25th International Symposium on Computer Architecture and High Performance Computing.

[2]  David Aspinall Java Memory Model Examples: Good, Bad and Ugly , 2007 .

[3]  Anoop Gupta,et al.  Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.

[4]  Abhishek Bhattacharjee,et al.  Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2013, ASPLOS.

[5]  Margaret Martonosi,et al.  CCICheck: Using μhb graphs to verify the coherence-consistency interface , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  William Pugh The Java memory model is fatally flawed , 2000, Concurr. Pract. Exp..

[7]  Peter Sewell,et al.  Mathematizing C++ concurrency , 2011, POPL '11.

[8]  Donald W. Loveland,et al.  A machine program for theorem-proving , 2011, CACM.

[9]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[10]  Aamer Jaleel,et al.  CoLT: Coalesced Large-Reach TLBs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[11]  Avi Mendelson,et al.  DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[12]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[13]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[14]  Abhishek Bhattacharjee,et al.  Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2014, ASPLOS.

[15]  Rajeev Alur,et al.  An Axiomatic Memory Model for POWER Multiprocessors , 2012, CAV.

[16]  Daniel J. Sorin,et al.  Specifying and dynamically verifying address translation-aware memory consistency , 2010, ASPLOS XV.

[17]  Boris Grot,et al.  International Symposium on Computer Architecture (ISCA) , 2017, ISCA 2017.

[18]  Albert Cohen,et al.  Correct and efficient work-stealing for weak memory models , 2013, PPoPP '13.

[19]  Per Stenström,et al.  Recency-based TLB preloading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[20]  Jade Alglave,et al.  Understanding POWER multiprocessors , 2011, PLDI '11.

[21]  Peter Sewell,et al.  Clarifying and compiling C/C++ concurrency: from C++11 to POWER , 2012, POPL '12.

[22]  Margaret Martonosi,et al.  Verifying Correct Microarchitectural Enforcement of Memory Consistency Models , 2015, IEEE Micro.

[23]  Benedict R. Gaster HSA memory model , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).

[24]  Daniel J. Sorin,et al.  UNified Instruction/Translation/Data (UNITD) coherence: One protocol to rule them all , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[25]  Margaret Martonosi,et al.  Inter-core cooperative TLB for chip multiprocessors , 2010, ASPLOS XV.

[26]  David A. Wood,et al.  Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[27]  Daniel J. Sorin,et al.  Address Translation Aware Memory Consistency , 2011, IEEE Micro.

[28]  Jade Alglave,et al.  A formal hierarchy of weak memory models , 2012, Formal Methods in System Design.

[29]  Osman S. Unsal,et al.  Redundant Memory Mappings for fast access to large memories , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[30]  Peter Sewell,et al.  A Better x86 Memory Model: x86-TSO , 2009, TPHOLs.

[31]  David Lang,et al.  N4215: Towards Implementation and Use of memory order consume , 2014 .

[32]  Ganesh Gopalakrishnan,et al.  GPU Concurrency: Weak Behaviours and Programming Assumptions , 2015, ASPLOS.

[33]  William Pugh The Java memory model is fatally flawed , 2000 .

[34]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[35]  Jeremy Manson,et al.  The Java memory model , 2005, POPL '05.

[36]  Christine Paulin-Mohring,et al.  The coq proof assistant reference manual , 2000 .

[37]  Margaret Martonosi,et al.  PipeCheck: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.