论文信息 - Dynamic memory optimization and parallelism management for OpenCL

Dynamic memory optimization and parallelism management for OpenCL

Recently, multiprocessor platforms have become trends for achieving high performance. OpenCL (Open Computing Language) is one of the programming standards for heterogeneous multiprocessors, and provides portability for these platforms. Our research focuses on platforms with CPUs and GPUs since GPUs are now widespread in use. On such a platform, two programming issues may affect the performance on GPU computing significantly. One is the work load distribution and another is the employment of GPU memory hierarchy. To fully utilize the characteristics of GPUs, programmers have to be not only proficient at parallel programming but also familiar with hardware specifications. Therefore, in this paper, we propose a compilation pass to automatically perform optimizations for OpenCL kernels. Our compilation pass will transform an input naïve kernel function with optimizations, including kernel function analysis, work-group rearrangement, memory coalescing, and work-item merge. In addition, our framework is implemented on a runtime system so that it may dynamically adjust the optimizing parameters according to the hardware specifications. Considering the execution time, the optimized kernels generated by our design may have significant performance improvement over the naïve versions. Although the optimizations performed in runtime may incur time overheads, the overheads may be covered by intensive kernel computation or massive input data in most cases.

Jean Jyh-Jiun Shann | I-Wei Wu | Chao-Hung Hsu

[1] Yi Yang,et al. A unified optimizing compiler framework for different GPGPU architectures , 2012, TACO.

[2] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.

[3] Rudolf Eigenmann,et al. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation , 2003, LCPC.

[4] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).