Formal Analysis of GPU Programs with Atomics via Conflict-Directed Delay-Bounding

GPU based computing has made significant strides in recent years. Unfortunately, GPU program optimizations can introduce subtle concurrency errors, and so incisive formal bug-hunting methods are essential. This paper presents a new formal bug-hunting method for GPU programs that combine barriers and atomics. We present an algorithm called c onflict-directed d elay-bounded scheduling algorithm (CD) that exploits the occurrence of conflicts among atomic synchronization commands to trigger the generation of alternate schedules; these alternate schedules are executed in a delay-bounded manner. We formally describe CD, and present two correctness checking methods, one based on final state comparison, and the other on user assertions. We evaluate our implementation on realistic GPU benchmarks, with encouraging results.

[1]  Adam Betts,et al.  GPUVerify: a verifier for GPU kernels , 2012, OOPSLA '12.

[2]  Patrice Godefroid,et al.  Dynamic partial-order reduction for model checking software , 2005, POPL '05.

[3]  Keshav Pingali,et al.  A GPU implementation of inclusion-based points-to analysis , 2012, PPoPP '12.

[4]  Koushik Sen,et al.  A Race-Detection and Flipping Algorithm for Automated Testing of Multi-threaded Programs , 2006, Haifa Verification Conference.

[5]  Rachid Guerraoui,et al.  Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated , 2011, POPL '11.

[6]  Luís Paulo Santos,et al.  Wait-Free Shared-Memory Irradiance Caching , 2011, IEEE Computer Graphics and Applications.

[7]  Madan Musuvathi,et al.  Iterative context bounding for systematic testing of multithreaded programs , 2007, PLDI '07.

[8]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[9]  Hubert Nguyen,et al.  GPU Gems 3 , 2007 .

[10]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[11]  Peng Li,et al.  Parametric flows: Automated behavior equivalencing for symbolic analysis of races in CUDA programs , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Thomas W. Reps,et al.  Reducing Concurrent Analysis Under a Context Bound to Sequential Analysis , 2008, CAV.

[13]  Edmund M. Clarke,et al.  Model Cheking , 1997, Foundations of Software Technology and Theoretical Computer Science.

[14]  Feng Qin,et al.  GRace: a low-overhead mechanism for detecting data races in GPU programs , 2011, PPoPP '11.

[15]  Koushik Sen,et al.  Race directed random testing of concurrent programs , 2008, PLDI '08.

[16]  Zvonimir Rakamaric,et al.  Delay-bounded scheduling , 2011, POPL '11.

[17]  Wen-mei W. Hwu,et al.  GPU Computing Gems Emerald Edition , 2011 .

[18]  David Holmes,et al.  Java Concurrency in Practice , 2006 .

[19]  David A. Wood,et al.  A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.

[20]  Peng Li,et al.  GKLEE: concolic verification and test generation for GPUs , 2012, PPoPP '12.

[21]  Sorin Lerner,et al.  Verifying GPU kernels by test amplification , 2012, PLDI.

[22]  Guodong Li,et al.  Scalable SMT-based verification of GPU kernel functions , 2010, FSE '10.

[23]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[24]  Paul H. J. Kelly,et al.  Symbolic Testing of OpenCL Code , 2011, Haifa Verification Conference.

[25]  Rachid Guerraoui,et al.  Laws of order , 2011, POPL 2011.