Profiling OpenCL Kernels Using Wavefront Occupancy with Radeon GPU Profiler

Profiling OpenCL [3] applications on modern GPUs is usually limited to gathering timestamps from the host side or gathering performance counter data for a complete GPU kernel. In this presentation, we will show the limitations of existing performance counter-based methods with respect to optimizing complex applications. Existing performance counter-based methods of profiling only provide information aggregated over a kernel's lifetime and does not provide insight into load balancing across shader engines or the behavior of a GPU kernel over time. In this paper, we present the Radeon GPU Profiler (RGP) [2]. RGP is a performance analysis tool that enables OpenCL developers the ability to understand the utilization of their device during their OpenCL kernel's execution.