Towards Automatic Restrictification of CUDA Kernel Arguments

Many procedural languages, such as C and C++, have pointers. Pointers are powerful and convenient, but pointer aliasing still hinders compiler optimizations, despite several years of research on pointer aliasing analysis. Because alias analysis is a difficult task and results are not always accurate, the ISO C standard 99 has added a keyword, named restrict to allow the programmer to specify non-aliasing as an aid to the compiler's optimizer and to thereby possibly improve performance. The task of annotating pointers with the restrict keyword is still left to the programmer. This task is, in general, tedious and prone to errors especially since the C does not perform any verification to ensure that restrict keyword is not misplaced. In this paper we present a static analysis tool that (i) finds CUDA kernels call sites in which actual parameters do not alias; (ii) clones the kernels called at such sites; (iii) after performing an alias analysis in these kernels, adds the restrict keyword to their arguments; and (iv) replaces the original kernel call by a call to the optimized clone whenever possible.

[1]  Fernando Magno Quintão Pereira,et al.  Combining range and inequality information for pointer disambiguation , 2018, Sci. Comput. Program..

[2]  Allen D. Malony,et al.  Autotuning GPU Kernels via Static and Predictive Analysis , 2017, 2017 46th International Conference on Parallel Processing (ICPP).

[3]  Laure Gonnord,et al.  Pointer disambiguation via strict inequalities , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[4]  Ben Hardekopf,et al.  Flow-sensitive pointer analysis for millions of lines of code , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[5]  Peng Li,et al.  Practical Symbolic Race Checking of GPU Programs , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Reena Panda,et al.  Statistical pattern based modeling of GPU memory access streams , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[7]  Lars Ole Andersen,et al.  Program Analysis and Specialization for the C Programming Language , 2005 .

[8]  Péricles Rafael Oliveira Alves,et al.  Restrictification of function arguments , 2016, CC.

[9]  Péricles Rafael Oliveira Alves,et al.  Runtime pointer disambiguation , 2015, OOPSLA.

[10]  Michael Hind,et al.  Pointer analysis: haven't we solved this problem yet? , 2001, PASTE '01.

[11]  Susan Horwitz,et al.  Fast and accurate flow-insensitive points-to analysis , 1997, POPL '97.

[12]  Adam Betts,et al.  Engineering a Static Verification Tool for GPU Kernels , 2014, CAV.

[13]  Yannis Smaragdakis,et al.  Structure-Sensitive Points-To Analysis for C and C++ , 2016, SAS.

[14]  Wu-chun Feng,et al.  AutoMatch: An automated framework for relative performance estimation and workload distribution on heterogeneous HPC systems , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[15]  Michael Hind,et al.  Which pointer analysis should I use? , 2000, ISSTA '00.

[16]  Sudhakar Yalamanchili,et al.  A characterization and analysis of PTX kernels , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).