A framework for visualization of OpenCL applications execution: a tutorial

Evaluating parallel and heterogeneous programs written in OpenCL can be challenging. Commonly, simulators can be used to aid the programmer in this regard. One of the fundamental requirements of any simulator is to provide traces, reports, and debugging information in a coherent and unambiguous format. Although these traces or reports contain a lot of detailed information about the logical and physical transactions within a simulated structure, they are usually extremely large and hard to analyze. What is needed is an appropriate visualization tool to accompany the simulator to make OpenCL execution process easier to understand and analyze. In this tutorial, we present M2S-Visual interactive cycle-by-cycle trace-driven visualization tool, a complimentary addition to Multi2sim (M2S). M2S is an established simulator, designed with an emphasis on running OpenCL applications without any source code modifications. The simulation of a complete OpenCL application occurs seamlessly by launching vendor-compliant host and device binaries. Multi2sim GPU emulator provides traces of Intel x86 CPU and AMD Southern-Island (as well as AMD Evergreen) GPU instructions, and the detailed simulator tracks execution times and state of architectural components in both host and device. M2S-Visual complements the simulator by providing the visual representation of running instructions and the state of the architectural components, together through a user-friendly GUI. During the execution of an OpenCL application, M2S-Visual captures and represents the state of CPU and GPU software entities (i.e. contexts, work-groups, wavefronts, and work-items), memory entities (i.e., accesses, sharers, owners), and network entities (i.e. messages and packets), along with the state of CPU and GPU hardware resources (i.e. cores and compute units), memory hierarchy (i.e., L1 cache, L2 cache and the main memory), and network resources (i.e., nodes, buses, links and buffers). We designed the M2S-Visual tool to support the research community, by providing deep analysis into the performance of OpenCL programs. We also introduce other new visualization options (through statistical graphs) in M2S which provide further details on OpenCL application characteristics and utilization of system resources. This includes plots that reveals the occupancy of compute units based on static and run-time characteristics of the executed OpenCL kernels, histograms that presents the memory access patterns of the OpenCL applications, plots that characterizes the network traffic generated by transactions between memory modules during an OpenCL application execution, and plots that reveals the utilization of network resources (such as links and buses) after the application execution is complete. The tutorial is organized in two parts, covering the full-system visualization of OpenCL application execution via M2S-Visual, and characterization of OpenCL application impact on system resource using the generated static graphs. Each section is accompanied with simulation examples using working demos. All material to reproduce these demos, as well as the tutorial slides, will be available on the tutorial website at http://www.multi2sim.org/conferences/iwocl-2015.html.