Simulee: Detecting CUDA Synchronization Bugs via Memory-Access Modeling

While CUDA has become a mainstream parallel computing platform and programming model for general-purpose GPU computing, how to effectively and efficiently detect CUDA synchronization bugs remains a challenging open problem. In this paper, we propose the first lightweight CUDA synchronization bug detection framework, namely Simulee, to model CUDA program execution by interpreting the corresponding LLVM bytecode and collecting the memory-access information for automatically detecting general CUDA synchronization bugs. To evaluate the effectiveness and efficiency of Simulee, we construct a benchmark with 7 popular CUDA-related projects from GitHub, upon which we conduct an extensive set of experiments. The experimental results suggest that Simulee can detect 21 out of the 24 manually identified bugs in our preliminary study and also 24 previously unknown bugs among all projects, 10 of which have already been confirmed by the developers. Furthermore, Simulee significantly outperforms state-of-the-art approaches for CUDA synchronization bug detection.

[1]  Feng Qin,et al.  GRace: a low-overhead mechanism for detecting data races in GPU programs , 2011, PPoPP '11.

[2]  P MillerBarton,et al.  Improving the accuracy of data race detection , 1991 .

[3]  Wei Li,et al.  DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems , 2018, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[4]  Edith Schonberg,et al.  An empirical comparison of monitoring algorithms for access anomaly detection , 2011, PPOPP '90.

[5]  Sarfraz Khurshid,et al.  DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[6]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[7]  Alastair F. Donaldson,et al.  Interleaving and Lock-Step Semantics for Analysis and Verification of GPU Kernels , 2013, ESOP.

[8]  Mark Harman,et al.  Automated Search for Good Coverage Criteria: Moving from Code Coverage to Fault Coverage through Search-Based Software Engineering , 2016, 2016 IEEE/ACM 9th International Workshop on Search-Based Software Testing (SBST).

[9]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[10]  Karl Rupp,et al.  Programming CUDA and OpenCL: A Case Study Using Modern C++ Libraries , 2012, SIAM J. Sci. Comput..

[11]  Xin Yao,et al.  Evolutionary programming made faster , 1999, IEEE Trans. Evol. Comput..

[12]  ChoiJong-Deok,et al.  Efficient and precise datarace detection for multithreaded object-oriented programs , 2002 .

[13]  Yi Yang,et al.  Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs , 2012, 2012 41st International Conference on Parallel Processing.

[14]  Jack J. Dongarra,et al.  From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..

[15]  Yuqun Zhang,et al.  Characterizing and Detecting CUDA Program Bugs , 2019, ArXiv.

[16]  Mark Harman,et al.  Pareto optimal search based refactoring at the design level , 2007, GECCO '07.

[17]  Sergio Segura,et al.  SIP: Optimal Product Selection from Feature Models Using Many-Objective Evolutionary Optimization , 2016, ACM Trans. Softw. Eng. Methodol..

[18]  Feng Qin,et al.  GMRace: Detecting Data Races in GPU Programs via a Low-Overhead Scheme , 2014, IEEE Transactions on Parallel and Distributed Systems.

[19]  Lingming Zhang,et al.  Practical program repair via bytecode mutation , 2018, ISSTA.

[20]  Yuanyuan Zhang,et al.  Search-based software engineering: Trends, techniques and applications , 2012, CSUR.

[21]  Joseph Devietti,et al.  BARRACUDA: binary-level analysis of runtime RAces in CUDA programs , 2017, PLDI.

[22]  Darko Marinov,et al.  Reflection-aware static regression test selection , 2019, Proc. ACM Program. Lang..

[23]  Nikolai Tillmann,et al.  Test generation via Dynamic Symbolic Execution for mutation testing , 2010, 2010 IEEE International Conference on Software Maintenance.

[24]  Barton P. Miller,et al.  Improving the accuracy of data race detection , 1991, PPOPP '91.

[25]  Satish Narayanasamy,et al.  A case for an interleaving constrained shared-memory multi-processor , 2009, ISCA '09.

[26]  Peng Li,et al.  GKLEE: concolic verification and test generation for GPUs , 2012, PPoPP '12.

[27]  Lawrence J. Fogel,et al.  Intelligence Through Simulated Evolution: Forty Years of Evolutionary Programming , 1999 .

[28]  Joseph Devietti,et al.  CURD: a dynamic CUDA race detector , 2018, PLDI.

[29]  Yuqun Zhang,et al.  Automating CUDA Synchronization via Program Transformation , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[30]  Yifan Chen,et al.  An empirical study on TensorFlow program bugs , 2018, ISSTA.

[31]  Lesson,et al.  The Normal Distribution , 2019, Essentials of Pattern Recognition.

[32]  Peng Li,et al.  Practical Symbolic Race Checking of GPU Programs , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  Katsuro Inoue,et al.  Multi-Criteria Code Refactoring Using Search-Based Software Engineering , 2016, ACM Trans. Softw. Eng. Methodol..

[34]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[35]  Sarfraz Khurshid,et al.  An Empirical Study of Boosting Spectrum-Based Fault Localization via PageRank , 2021, IEEE Transactions on Software Engineering.

[36]  Wei Li,et al.  DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization , 2019, ISSTA.

[37]  Keshav Pingali,et al.  A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[38]  Jong-Deok Choi,et al.  Efficient and precise datarace detection for multithreaded object-oriented programs , 2002, PLDI '02.

[39]  Adam Betts,et al.  GPUVerify: a verifier for GPU kernels , 2012, OOPSLA '12.