The AMD APU (Accelerated Processing Unit) architecture, which combines CPU and GPU cores on the same die, is promising for GPU applications which performance is bottlenecked by the low PCI Express communication rate. However the first APU generations still have different CPU and GPU memory partitions. Currently, the APU integrated GPUs are also less powerful than discrete GPUs. In this paper we therefore investigate the interest of APUs for scientific computing by evaluating and comparing the performance of two successive AMD APUs (family codename Llano and Trinity), two successive discrete GPUs (chip codename Cayman and Tahiti) and one hexa-core AMD CPU. For this purpose, we rely on a 3D finite difference stencil, that is optimized and tuned in OpenCL. We detail the most interesting optimizations for each architecture and show very good performance in OpenCL: up to 500 Gflops on Tahiti. Finally, our results show that APU integrated GPUs outperform CPUs, and that integrated GPUs of upcoming APUs may match discrete GPUs for problems with high communication requirements.
[1]
S. Pratap Vanka,et al.
COMPUTATIONAL FLUID DYNAMICS USING GRAPHICS PROCESSING UNITS: CHALLENGES AND OPPORTUNITIES
,
2011
.
[2]
I. Tsukerman,et al.
Electromagnetic applications of a new finite-difference calculus
,
2005,
IEEE Transactions on Magnetics.
[3]
Samuel Williams,et al.
Auto-Tuning the 27-point Stencil for Multicore
,
2009
.
[4]
Paulius Micikevicius,et al.
3D finite difference computation on GPUs using CUDA
,
2009,
GPGPU-2.
[5]
Henri Calandra,et al.
Fast seismic modeling and Reverse Time Migration on a GPU cluster
,
2009,
2009 International Conference on High Performance Computing & Simulation.