Checking Data-Race Freedom of GPU Kernels, Compositionally

GPUs offer parallelism as a commodity, but they are difficult to program correctly. Static analyzers that guarantee data-race freedom (DRF) are essential to help programmers establish the correctness of their programs (kernels). However, existing approaches produce too many false alarms and struggle to handle larger programs. To address these limitations we formalize a novel compositional analysis for DRF, based on access memory protocols. These protocols are behavioral types that codify the way threads interact over shared memory. Our work includes fully mechanized proofs of our theoretical results, the first mechanized proofs in the field of DRF analysis for GPU kernels. Our theory is implemented in Faial, a tool that outperforms the state-ofthe-art. Notably, it can correctly verify at least 1.42× more real-world kernels, and it exhibits a linear growth in 4 out of 5 experiments, while others grow exponentially in all 5 experiments.

[1]  Paul H. J. Kelly,et al.  Symbolic crosschecking of floating-point and SIMD code , 2011, EuroSys '11.

[2]  Joseph Devietti,et al.  BARRACUDA: binary-level analysis of runtime RAces in CUDA programs , 2017, PLDI.

[3]  Peng Li,et al.  Practical Symbolic Race Checking of GPU Programs , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Kevin W. Hamlen,et al.  CUDA au Coq: A Framework for Machine-validating GPU Assembly Programs , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Bor-Yuh Evan Chang,et al.  Boogie: A Modular Reusable Verifier for Object-Oriented Programs , 2005, FMCO.

[6]  Guodong Li,et al.  Scalable SMT-based verification of GPU kernel functions , 2010, FSE '10.

[7]  Antonia Zhai,et al.  HAccRG: Hardware-Accelerated Data Race Detection in GPUs , 2013, 2013 42nd International Conference on Parallel Processing.

[8]  Paul H. J. Kelly,et al.  Symbolic Testing of OpenCL Code , 2011, Haifa Verification Conference.

[9]  Atsushi Igarashi,et al.  Automated Verification of Functional Correctness of Race-Free GPU Programs , 2016, Journal of Automated Reasoning.

[10]  Arkaprava Basu,et al.  ScoRD: A Scoped Race Detector for GPUs , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[11]  Zijiang Yang,et al.  Symbolic Analysis of Concurrency Errors in OpenMP Programs , 2013, 2013 42nd International Conference on Parallel Processing.

[12]  Guodong Li,et al.  Parameterized Verification of GPU Kernel Programs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[13]  Peng Li,et al.  GKLEE: concolic verification and test generation for GPUs , 2012, PPoPP '12.

[14]  Alastair F. Donaldson,et al.  Interleaving and Lock-Step Semantics for Analysis and Verification of GPU Kernels , 2013, ESOP.

[15]  Feng Qin,et al.  GRace: a low-overhead mechanism for detecting data races in GPU programs , 2011, PPoPP '11.

[16]  Atsushi Igarashi,et al.  A Hoare Logic for GPU Kernels , 2017, TOCL.

[17]  J. Ramanujam,et al.  Optimistic Delinearization of Parametrically Sized Arrays , 2015, ICS.

[18]  John Wickerson,et al.  The Design and Implementation of a Verification Technique for GPU Kernels , 2015, TOPL.

[19]  Yuqun Zhang,et al.  Simulee: Detecting CUDA Synchronization Bugs via Memory-Access Modeling , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[20]  Atsushi Igarashi,et al.  A Hoare Logic for SIMT Programs , 2013, APLAS.

[21]  John Wickerson,et al.  KernelInterceptor: automating GPU kernel verification by intercepting kernels and their parameters , 2014, IWOCL '14.

[22]  Adam Betts,et al.  GPUVerify: a verifier for GPU kernels , 2012, OOPSLA '12.

[23]  Davide Ancona,et al.  Behavioral Types in Programming Languages , 2016, Found. Trends Program. Lang..

[24]  Jan Hoffmann,et al.  Modeling and analyzing evaluation cost of CUDA kernels , 2021, Proc. ACM Program. Lang..

[25]  Vasco Thudichum Vasconcelos Session types for linear multithreaded functional programming , 2009, PPDP '09.

[26]  Joseph Devietti,et al.  CURD: a dynamic CUDA race detector , 2018, PLDI.

[27]  David Pichardie,et al.  A Certified Data Race Analysis for a Java-like Language , 2009, TPHOLs.

[28]  Lucas C. Cordeiro,et al.  Verifying CUDA programs using SMT-based context-bounded model checking , 2016, SAC.

[29]  Mayez A. Al-Mouhamed,et al.  Padding free bank conflict resolution for CUDA-based matrix transpose algorithm , 2014, 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[30]  Marieke Huisman,et al.  Specification and verification of GPGPU programs , 2013, Sci. Comput. Program..

[31]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[32]  António Ravara,et al.  Session Types for Functional Multithreading , 2004, CONCUR.

[33]  Vivek Sarkar,et al.  An Extended Polyhedral Model for SPMD Programs and Its Use in Static Data Race Detection , 2016, LCPC.

[34]  Feng Qin,et al.  GMRace: Detecting Data Races in GPU Programs via a Low-Overhead Scheme , 2014, IEEE Transactions on Parallel and Distributed Systems.

[35]  Adam Betts,et al.  Engineering a Static Verification Tool for GPU Kernels , 2014, CAV.

[36]  Nobuko Yoshida,et al.  Protocol-based verification of message-passing parallel programs , 2015, OOPSLA.