A Hoare Logic for SIMT Programs

We study a Hoare Logic to reason about GPU kernels, which are parallel programs executed on GPUs. We consider the SIMT (Single Instruction Multiple Threads) execution model, in which multiple threads execute in lockstep (that is, execute the same instruction at a time). When control branches both branches are executed sequentially but during the execution of each branch only those threads that take it are enabled; after the control converges, all threads are enabled and execute in lockstep again. In this paper we adapt Hoare Logic to the SIMT setting, by adding an extra component representing the set of enabled threads to the usual Hoare triples. It turns out that soundness and relative completeness do not hold for all programs; a difficulty arises from the fact that one thread can invalidate the loop termination condition of another thread through shared memory. We overcome this difficulty by identifying an appropriate class of programs for which soundness and relative completeness hold.

[1]  Paul H. J. Kelly,et al.  Symbolic crosschecking of floating-point and SIMD code , 2011, EuroSys '11.

[2]  Alexander Knapp,et al.  On the Correctness of the SIMT Execution Model of GPUs , 2012, ESOP.

[3]  Stavros Tripakis,et al.  Checking Equivalence of SPMD Programs Using Non- Interference , 2010 .

[4]  Marieke Huisman,et al.  Specification and Verification of GPGPU Programs using Permission-Based Separation Logic , 2013 .

[5]  Guodong Li,et al.  Scalable SMT-based verification of GPU kernel functions , 2010, FSE '10.

[6]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[7]  Alastair F. Donaldson,et al.  Interleaving and Lock-Step Semantics for Analysis and Verification of GPU Kernels , 2013, ESOP.

[8]  Frank S. de Boer,et al.  Verification of Sequential and Concurrent Programs , 1997, Texts and Monographs in Computer Science.

[9]  Guodong Li,et al.  Formal Analysis of GPU Programs with Atomics via Conflict-Directed Delay-Bounding , 2013, NASA Formal Methods.

[10]  Kim G. Larsen,et al.  Memory Efficient Data Structures for Explicit Verification of Timed Systems , 2014, NASA Formal Methods.

[11]  Peng Li,et al.  Parametric flows: Automated behavior equivalencing for symbolic analysis of races in CUDA programs , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Guodong Li,et al.  Parameterized Verification of GPU Kernel Programs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[13]  Paul H. J. Kelly,et al.  Symbolic Testing of OpenCL Code , 2011, Haifa Verification Conference.

[14]  Adam Betts,et al.  GPUVerify: a verifier for GPU kernels , 2012, OOPSLA '12.

[15]  Peng Li,et al.  GKLEE: concolic verification and test generation for GPUs , 2012, PPoPP '12.

[16]  Brian Campbell,et al.  Amortised Memory Analysis Using the Depth of Data Structures , 2009, ESOP.