ICS: U: Towards Shared Memory Consistency Models for GPUs

A Graphics Processing Unit (GPU) is a compute accelerated microprocessor designed with many cores and high data bandwidth [12, p. 3-5]. These devices were originally used for graphics acceleration; however, their high arithmetic throughput and energy efficiency made them attractive for use in other applications. In 2006, NVIDIA released their first generalpurpose GPU that supported the CUDA architecture [19, p. 6] which allowed programmers to develop applications more easily that run on GPUs. Since then, GPUs have continued to be used in many applications and are present in devices ranging from the top supercomputers [22] to smartphones and tablets [27].

[1]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[2]  William W. Collier,et al.  Reasoning about parallel architectures , 1992 .

[3]  Brian Case,et al.  SPARC architecture , 1992 .

[4]  David L Weaver,et al.  The SPARC architecture manual : version 9 , 1994 .

[5]  David L. Dill,et al.  The Murphi Verification System , 1996, CAV.

[6]  John J. Cannon,et al.  The Magma Algebra System I: The User Language , 1997, J. Symb. Comput..

[7]  Janak H. Patel,et al.  A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.

[8]  Sridhar Narayanan,et al.  TSOtool: a program for verifying memory systems using the memory consistency model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[9]  Peter Sewell,et al.  A Better x86 Memory Model: x86-TSO , 2009, TPHOLs.

[10]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[11]  Jade Alglave,et al.  Fences in Weak Memory Models , 2010, CAV.

[12]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[13]  Francesco Zappa Nardelli,et al.  x86-TSO , 2010, Commun. ACM.

[14]  David A. Wood,et al.  A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.

[15]  Jade Alglave,et al.  Litmus: Running Tests against Hardware , 2011, TACAS.

[16]  Jade Alglave,et al.  Understanding POWER multiprocessors , 2011, PLDI '11.

[17]  John D. Owens,et al.  Efficient Synchronization Primitives for GPUs , 2011, ArXiv.

[18]  Rajeev Alur,et al.  An Axiomatic Memory Model for POWER Multiprocessors , 2012, CAV.

[19]  David A. Wood,et al.  Sequential Consistency for Heterogeneous-Race-Free: Programmer-centric Memory Models for Heterogeneous Platforms , 2013 .

[20]  Ganesh Gopalakrishnan,et al.  Towards shared memory consistency models for GPUs , 2013, ICS '13.

[21]  Daniel J. Sorin,et al.  Exploring memory consistency for massively-threaded throughput-oriented processors , 2013, ISCA.

[22]  David A. Wood,et al.  Heterogeneous-race-free memory models , 2014, ASPLOS.

[23]  Herding Cats , 2013, ACM Trans. Program. Lang. Syst..