Interleaving and Lock-Step Semantics for Analysis and Verification of GPU Kernels

We study semantics of GPU kernels -- the parallel programs that run on Graphics Processing Units (GPUs). We provide a novel lock-step execution semantics for GPU kernels represented by arbitrary reducible control flow graphs and compare this semantics with a traditional interleaving semantics. We show for terminating kernels that either both semantics compute identical results or both behave erroneously. The result induces a method that allows GPU kernels with arbitrary reducible control flow graphs to be verified via transformation to a sequential program that employs predicated execution. We implemented this method in the GPUVerify tool and experimentally evaluated it by comparing the tool with the previous version of the tool based on a similar method for structured programs, i.e., where control is organised using if and while statements. The evaluation was based on a set of 163 open source and commercial GPU kernels. Among these kernels, 42 exhibit unstructured control flow which our novel method can handle fully automatically, but the previous method could not. Overall the generality of the new method comes at a modest price: Verification across our benchmark set was 2.25 times slower overall; however, the median slow down across all kernels was 0.77, indicating that our novel technique yielded faster analysis in many cases.

[1]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[2]  Axel Habermaier,et al.  The Model of Computation of CUDA and its Formal Semantics , 2011 .

[3]  David R. Kaeli,et al.  Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems , 2010, GPGPU-3.

[4]  Bor-Yuh Evan Chang,et al.  Boogie: A Modular Reusable Verifier for Object-Oriented Programs , 2005, FMCO.

[5]  Adam Betts,et al.  GPUVerify: a verifier for GPU kernels , 2012, OOPSLA '12.

[6]  Haibo Chen,et al.  A GPU-based high-throughput image retrieval algorithm , 2012, GPGPU-5.

[7]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[8]  Peng Li,et al.  GKLEE: concolic verification and test generation for GPUs , 2012, PPoPP '12.

[9]  Sorin Lerner,et al.  Verifying GPU kernels by test amplification , 2012, PLDI.

[10]  Brian Campbell,et al.  Amortised Memory Analysis Using the Depth of Data Structures , 2009, ESOP.

[11]  Richard J. Lipton,et al.  Space-Time Trade-Offs in Structured Programming: An Improved Combinatorial Embedding Theorem , 1980, JACM.

[12]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[13]  Marcello M. Bonsangue,et al.  Formal Methods for Components and Objects - 8th International Symposium, FMCO 2009, Eindhoven, The Netherlands, November 4-6, 2009. Revised Selected Papers , 2010, FMCO.

[14]  Paul H. J. Kelly,et al.  Symbolic Testing of OpenCL Code , 2011, Haifa Verification Conference.

[15]  Guodong Li,et al.  Scalable SMT-based verification of GPU kernel functions , 2010, FSE '10.

[16]  Rajeev Alur,et al.  A Temporal Logic of Nested Calls and Returns , 2004, TACAS.

[17]  K. Rustan M. Leino,et al.  Weakest-precondition of unstructured programs , 2005, PASTE '05.

[18]  Leslie Lamport,et al.  What Good is Temporal Logic? , 1983, IFIP Congress.

[19]  Alexander Knapp,et al.  On the Correctness of the SIMT Execution Model of GPUs , 2012, ESOP.

[20]  Tor M. Aamodt,et al.  Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[21]  Alfred V. Aho,et al.  , “Compilers- Principles, Techniques, and Tools”, Pearson Education Asia, 2007. , 2015 .