OpenACC - First Experiences with Real-World Applications

Today's trend to use accelerators like GPGPUs in heterogeneous computer systems has entailed several low-level APIs for accelerator programming. However, programming these APIs is often tedious and therefore unproductive. To tackle this problem, recent approaches employ directive-based high-level programming for accelerators. In this work, we present our first experiences with OpenACC, an API consisting of compiler directives to offload loops and regions of C/C++ and Fortran code to accelerators. We compare the performance of OpenACC to PGI Accelerator and OpenCL for two real-world applications and evaluate programmability and productivity. We find that OpenACC offers a promising ratio of development effort to performance and that a directive-based approach to program accelerators is more efficient than low-level APIs, even if suboptimal performance is achieved.

[1]  K. Milfeld,et al.  Early experiences with the intel many integrated cores accelerated computing technology , 2011 .

[2]  Bronis R. de Supinski,et al.  OpenMP for Accelerators , 2011, IWOMP.

[3]  Marina Schroder Facing the Multicore-Challenge - Aspects of New Paradigms and Technologies in Parallel Computing [Proceedings of a conference held at the Heidelberger Akademie der Wissenschaften, March 17-19, 2010] , 2011, Facing the Multicore-Challenge.

[4]  Jack J. Purdum,et al.  C programming guide , 1983 .

[5]  Christian Brecher,et al.  Simulation of bevel gear cutting with GPGPUs—performance and productivity , 2011, Computer Science - Research and Development.

[6]  R. Dolbeau,et al.  HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[7]  Tarek S. Abdelrahman,et al.  hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.

[8]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[9]  H. Martin Bücker,et al.  Parallel Minimum p-Norm Solution of the Neuromagnetic Inverse Problem for Realistic Signals Using Exact Hessian-Vector Products , 2008, SIAM J. Sci. Comput..

[10]  Josef Weidendorfer,et al.  Considering GPGPU for HPC Centers: Is It Worth the Effort? , 2010, Facing the Multicore-Challenge.

[11]  Uday Bondhugula,et al.  Can CPUs Match GPUs on Performance with Productivity?: Experiences with Optimizing a FLOP-intensive Application on CPUs and GPU , 2010 .

[12]  Alejandro Duran,et al.  Extending OpenMP to Survive the Heterogeneous Multi-Core Era , 2010, International Journal of Parallel Programming.

[13]  William Gropp,et al.  OpenMP in the Petascale Era - 7th International Workshop on OpenMP, IWOMP 2011, Chicago, IL, USA, June 13-15, 2011. Proceedings , 2011, IWOMP.

[14]  Rudolf Eigenmann,et al.  OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.