GPU is widely used for high-performance computing. However, standard programming framework such as CUDA and OpenCL requires low-level specifications, thus programming is difficult and the performance is not portable . Therefore, we are developing a new framework named MESICUDA. Providing virtual shared variables accessible from both CPU and GPU, MESI-CUDA hides complex memory architecture and eliminates low-level API function calls. However, the performance of current implementation is not sufficient because of the large memory access latency. There fore, we propose a code-optimization scheme that utilizes fast on-chip shared memories as a compiler-level explicit cache of the off-chip device memory. The compiler estimates access count/range of arrays using static analysis. For mostly reused variables, code is modified to make copy on the shared memory and access the copy, using small shared memories efficiently. As the result of evaluation, our schem e achieved 13%–192% speedup in two of three programs.
[1]
Takahiro Sasaki,et al.
A GPGPU PROGRAMMING FRAMEWORK BASED ON A SHARED-MEMORY MODEL
,
2013
.
[2]
Rudolf Eigenmann,et al.
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
,
2009,
PPoPP '09.
[3]
Wen-mei W. Hwu,et al.
CUDA-Lite: Reducing GPU Programming Complexity
,
2008,
LCPC.
[4]
Masaki Matsumoto,et al.
SUPPORTING DYNAMIC DATA STRUCTURES IN A SHARED-MEMORY BASED GPGPU PROGRAMMING FRAMEWORK
,
2012
.
[5]
Yi Yang,et al.
A GPGPU compiler for memory optimization and parallelism management
,
2010,
PLDI '10.
[6]
Satoshi Matsuoka,et al.
CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application
,
2013,
2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.