在Intel Knights Corner和NVIDIA Kepler架构上OpenACC的性能可移植性分析 (Performance Portability Evaluation for OpenACC on Intel Knights Corner and NVIDIA Kepler)

OpenACC is a programming standard designed to simplify heterogeneous parallel programming by using directives. Since OpenACC can generate OpenCL and CUDA code, meanwhile running OpenCL on Intel Knight Corner is supported by CAPS HMPP compiler, it is attractive to using OpenACC on hardwares with different underlying microarchitectures. This paper studies how realistic it is to use a single OpenACC source code for a set of hardwares with different underlying micro-architectures. Intel Knight Corner and Nvidia Kepler products are the targets in the experiment, since they are with the latest architectures and have similar peak performance. Meanwhile CAPS OpenACC compiler is used to compile EPCC OpenACC benchmark suite, Stream and MaxFlops of SHOC benchmarks to access the peformance. To study the performance portability, roofline model and relative performance model are built by the data of experiments. This paper shows that at most 82% performance compared with peak performance on Kepler and Knight Corner is achieved by specific benchmarks, but as the rise of arithmetic intensity the average performance is approximately 10%. And there is a big performance gap between Intel Knight Corner and Nvidia Kepler on several benchmarks. This study confirms that performance portability of OpenACC is related to the arithmetic intensity and a big performance gap still exsits in specific benchmarks between different hardware platforms.