ArMOR: Defending against memory consistency model mismatches in heterogeneous architectures

Architectural heterogeneity is increasing: numerous products and studies have proven the benefits of combining cores and accelerators with varying ISAs into a single system. However, an underappreciated barrier to unlocking the full potential of heterogeneity is the need to specify and to reconcile differences in memory consistency models across layers of the hardware-software stack and among on-chip components. This paper presents ArMOR, a framework for specifying, comparing, and translating between memory consistency models. ArMOR defines MOSTs, an architecture-independent and precise format for specifying the semantics of memory ordering requirements such as preserved program order or explicit fences. MOSTs allow any two consistency models to be directly and algorithmically compared, and they help avoid many of the pitfalls of traditional consistency model analysis. As a case study, we use ArMOR to automatically generate translation modules called shims that dynamically translate code compiled for one memory model to execute on hardware implementing a different model.

[1]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[2]  Dennis Shasha,et al.  Efficient and correct execution of parallel programs that share memory , 1988, TOPL.

[3]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[4]  Anoop Gupta,et al.  Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.

[5]  David L Weaver,et al.  The SPARC architecture manual : version 9 , 1994 .

[6]  V AdveSarita,et al.  Shared Memory Consistency Models , 1996 .

[7]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[8]  Arvind,et al.  Commit-Reconcile and Fences (CRF): a new memory model for architects and compiler writers , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[9]  Larry Rudolph,et al.  Commit-reconcile & fences (CRF): a new memory model for architects and compiler writers , 1999, ISCA.

[10]  Jaejin Lee,et al.  Hiding relaxed memory consistency with compilers , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[11]  Michael Gschwind,et al.  Binary translation and architecture convergence issues for IBM system/390 , 2000, ICS '00.

[12]  David Seal,et al.  ARM Architecture Reference Manual , 2001 .

[13]  Jaejin Lee,et al.  Hiding relaxed memory consistency with a compiler , 2001 .

[14]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[15]  Tevi Devor,et al.  IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium®-based systems , 2003, MICRO.

[16]  Yun Wang,et al.  IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[17]  Jeremy Manson,et al.  The Java memory model , 2005, POPL '05.

[18]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[19]  David A. Padua,et al.  Compiler techniques for high performance sequentially consistent java programs , 2005, PPOPP.

[20]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[21]  Lisa Higham,et al.  Translating between itanium and sparc memory consistency models , 2006, SPAA '06.

[22]  Arvind,et al.  Memory Model = Instruction Reordering + Store Atomicity , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[23]  Sebastian Burckhardt,et al.  CheckFence: checking consistency of concurrent data types on relaxed memory models , 2007, PLDI '07.

[24]  Jason N. Dale,et al.  Cell Broadband Engine Architecture and its first implementation - A performance view , 2007, IBM J. Res. Dev..

[25]  CheckFence: checking consistency of concurrent data types on relaxed memory models , 2007, PLDI.

[26]  Thuan Quang Huynh,et al.  Memory model sensitive bytecode verification , 2007, Formal Methods Syst. Des..

[27]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[28]  Avi Mendelson,et al.  Programming model for a heterogeneous x86 platform , 2009, PLDI '09.

[29]  Francesco Zappa Nardelli,et al.  The semantics of power and ARM multiprocessor machine code , 2009, DAMP '09.

[30]  Peter Sewell,et al.  A Better x86 Memory Model: x86-TSO , 2009, TPHOLs.

[31]  Francesco Zappa Nardelli,et al.  Relaxed memory models must be rigorous , 2009 .

[32]  Aamer Jaleel,et al.  Analyzing Parallel Programs with PIN , 2010, Computer.

[33]  Jade Alglave,et al.  Fences in Weak Memory Models , 2010, CAV.

[34]  Sanjay J. Patel,et al.  Cohesion: a hybrid memory model for accelerators , 2010, ISCA.

[35]  Eran Yahav,et al.  Automatic inference of memory fences , 2010, Formal Methods in Computer Aided Design.

[36]  John E. Stone,et al.  An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.

[37]  Bradford M. Beckmann,et al.  The gem5 simulator , 2011, CARN.

[38]  David A. Wood,et al.  A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.

[39]  Sarita V. Adve,et al.  DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[40]  Jade Alglave,et al.  Understanding POWER multiprocessors , 2011, PLDI '11.

[41]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[42]  Viktor Vafeiadis,et al.  Verifying Fence Elimination Optimisations , 2011, SAS.

[43]  Satish Narayanasamy,et al.  End-to-end sequential consistency , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[44]  Rajeev Alur,et al.  An Axiomatic Memory Model for POWER Multiprocessors , 2012, CAV.

[45]  Peter Sewell,et al.  Clarifying and compiling C/C++ concurrency: from C++11 to POWER , 2012, POPL '12.

[46]  Jade Alglave,et al.  A formal hierarchy of weak memory models , 2012, Formal Methods in System Design.

[47]  Dean M. Tullsen,et al.  Execution migration in a heterogeneous-ISA chip multiprocessor , 2012, ASPLOS XVII.

[48]  Margaret Martonosi,et al.  Reducing GPU offload latency via fine-grained CPU-GPU synchronization , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[49]  Sarita V. Adve,et al.  DeNovoND: efficient hardware support for disciplined non-determinism , 2013, ASPLOS '13.

[50]  Josep Torrellas,et al.  WeeFence: toward making fences free in TSO , 2013, ISCA.

[51]  Correct and efficient work-stealing for weak memory models , 2013, PPOPP.

[52]  Albert Cohen,et al.  Correct and efficient work-stealing for weak memory models , 2013, PPoPP '13.

[53]  Suresh Jagannathan,et al.  CompCertTSO: A Verified Compiler for Relaxed-Memory Concurrency , 2013, JACM.

[54]  Margaret Martonosi,et al.  PipeCheck: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[55]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[56]  Herding cats: modelling, simulation, testing, and data-mining for weak memory , 2014, PLDI 2014.

[57]  Dean M. Tullsen,et al.  Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[58]  Ganesh Gopalakrishnan,et al.  GPU Concurrency: Weak Behaviours and Programming Assumptions , 2015, ASPLOS.

[59]  Eric S. Chung,et al.  A reconfigurable fabric for accelerating large-scale datacenter services , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).