Forward progress on GPU concurrency

The tutorial at CONCUR will provide a practical overview of work undertaken over the last six years in the Multicore Programming Group at Imperial College London, and with collaborators internationally, related to understanding and reasoning about concurrency in software designed for acceleration on GPUs. In this article we provide an overview of this work, which includes contributions to data race analysis, compiler testing, memory model understanding and formalisation, and most recently efforts to enable portable GPU implementations of algorithms that require forward progress guarantees.

[1]  Ganesh Gopalakrishnan,et al.  Portable inter-workgroup barrier synchronisation for GPUs , 2016, OOPSLA.

[2]  Keshav Pingali,et al.  A compiler for throughput optimization of graph algorithms on GPUs , 2016, OOPSLA.

[3]  John Wickerson,et al.  The Design and Implementation of a Verification Technique for GPU Kernels , 2015, TOPL.

[4]  Stavros Tripakis,et al.  Checking Equivalence of SPMD Programs Using Non- Interference , 2010 .

[5]  Adam Betts,et al.  Engineering a Static Verification Tool for GPU Kernels , 2014, CAV.

[6]  Alastair F. Donaldson,et al.  Interleaving and Lock-Step Semantics for Analysis and Verification of GPU Kernels , 2013, ESOP.

[7]  Marieke Huisman,et al.  Specification and Verification of Atomic Operations in GPGPU Programs , 2015, SEFM.

[8]  Alastair F. Donaldson,et al.  Termination analysis for GPU kernels , 2017, Sci. Comput. Program..

[9]  Alastair F. Donaldson,et al.  The Hitchhiker's Guide to Cross-Platform OpenCL Application Development , 2016, IWOCL.

[10]  Atsushi Igarashi,et al.  A Hoare Logic for SIMT Programs , 2013, APLAS.

[11]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[12]  John Wickerson,et al.  Remote-scope promotion: clarified, rectified, and verified , 2015, OOPSLA.

[13]  Paul H. J. Kelly,et al.  Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels , 2013, OOPSLA.

[14]  Alastair F. Donaldson,et al.  Exposing errors related to weak memory in GPU applications , 2016, PLDI.

[15]  Peng Li,et al.  GKLEE: concolic verification and test generation for GPUs , 2012, PPoPP '12.

[16]  Sorin Lerner,et al.  Verifying GPU kernels by test amplification , 2012, PLDI.

[17]  Guodong Li,et al.  Formal Analysis of GPU Programs with Atomics via Conflict-Directed Delay-Bounding , 2013, NASA Formal Methods.

[18]  George A. Constantinides,et al.  Automatically comparing memory consistency models , 2017, POPL.

[19]  Lucas C. Cordeiro,et al.  Verifying CUDA programs using SMT-based context-bounded model checking , 2016, SAC.

[20]  Adam Betts,et al.  Implementing and Evaluating Candidate-Based Invariant Generation , 2018, IEEE Transactions on Software Engineering.

[21]  Alastair F. Donaldson,et al.  Many-core compiler fuzzing , 2015, PLDI.

[22]  Jade Alglave,et al.  Herding cats: modelling, simulation, testing, and data-mining for weak memory , 2014, PLDI 2014.

[23]  W. M. McKeeman,et al.  Differential Testing for Software , 1998, Digit. Tech. J..

[24]  Ethel Bardsley,et al.  Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels , 2014, NASA Formal Methods.

[25]  Alastair F. Donaldson,et al.  Cooperative kernels: GPU multitasking for blocking algorithms , 2017, ESEC/SIGSOFT FSE.

[26]  Zhendong Su,et al.  Compiler validation via equivalence modulo inputs , 2014, PLDI.

[27]  Alastair F. Donaldson,et al.  A sound and complete abstraction for reasoning about parallel prefix sums , 2014, POPL.

[28]  David A. Wood,et al.  Synchronization Using Remote-Scope Promotion , 2015, ASPLOS.

[29]  Ganesh Gopalakrishnan,et al.  GPU Concurrency: Weak Behaviours and Programming Assumptions , 2015, ASPLOS.

[30]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[31]  K. Rustan M. Leino,et al.  Houdini, an Annotation Assistant for ESC/Java , 2001, FME.

[32]  Guodong Li,et al.  Scalable SMT-based verification of GPU kernel functions , 2010, FSE '10.

[33]  Marieke Huisman,et al.  Specification and verification of GPGPU programs , 2013, Sci. Comput. Program..

[34]  Bor-Yuh Evan Chang,et al.  Boogie: A Modular Reusable Verifier for Object-Oriented Programs , 2005, FMCO.

[35]  Adam Betts,et al.  GPUVerify: a verifier for GPU kernels , 2012, OOPSLA '12.

[36]  Feng Qin,et al.  GRace: a low-overhead mechanism for detecting data races in GPU programs , 2011, PPoPP '11.

[37]  Jeff A. Stuart,et al.  A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).

[38]  John Wickerson,et al.  Overhauling SC atomics in C11 and OpenCL , 2016, POPL.

[39]  Paul H. J. Kelly,et al.  Symbolic Crosschecking of Data-Parallel Floating-Point Code , 2014, IEEE Transactions on Software Engineering.