From Describing to Prescribing Parallelism: Translating the SPEC ACCEL OpenACC Suite to OpenMP Target Directives

Current and next generation HPC systems will exploit accelerators and self-hosting devices within their compute nodes to accelerate applications. This comes at a time when programmer productivity and the ability to produce portable code has been recognized as a major concern. One of the goals of OpenMP and OpenACC is to allow the user to specify parallelism via directives so that compilers can generate device specific code and optimizations. However, the challenge of porting codes becomes more complex because of the different types of parallelism and memory hierarchies available on different architectures. In this paper we discuss our experience with porting the SPEC ACCEL benchmarks from OpenACC to OpenMP 4.5 using a performance portable style that lets the compiler make platform-specific optimizations to achieve good performance on a variety of systems. The ported SPEC ACCEL OpenMP benchmarks were validated on different platforms including Xeon Phi, GPUs and CPUs. We believe that this experience can help the community and compiler vendors understand how users plan to write OpenMP 4.5 applications in a performance portable style.

[1]  Alistair P. Rendell,et al.  Implementation and Optimization of the OpenMP Accelerator Model for the TI Keystone II Architecture , 2014, IWOMP.

[2]  Daniel J. Quinlan,et al.  Experiences of Using the OpenMP Accelerator Model to Port DOE Stencil Applications , 2015, IWOMP.

[3]  Raffaele Tripiccione,et al.  On Portability, Performance and Scalability of an MPI OpenCL Lattice Boltzmann Code , 2014, Euro-Par Workshops.

[4]  Matthias S. Müller,et al.  SPEC OMP2012 - An Application Benchmark Suite for Parallel Systems Using OpenMP , 2012, IWOMP.

[5]  Matt Martineau,et al.  An Evaluation of Emerging Many-Core Parallel Programming Models , 2016, PMAM@PPoPP.

[6]  Kevin O'Brien,et al.  Coordinating GPU Threads for OpenMP 4.0 in LLVM , 2014, 2014 LLVM Compiler Infrastructure in HPC.

[7]  Christian Terboven,et al.  A Pattern-Based Comparison of OpenACC and OpenMP for Accelerator Computing , 2014, Euro-Par.

[8]  Guido Juckeland,et al.  Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[9]  Stephen A. Jarvis,et al.  Developing Performance-Portable Molecular Dynamics Kernels in OpenCL , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[10]  Matthias S. Müller,et al.  SPEC MPI2007—an application benchmark suite for parallel systems using MPI , 2010, ISC 2010.

[11]  Ravi Narayanaswamy,et al.  Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[12]  Bronis R. de Supinski,et al.  Early Experiences with the OpenMP Accelerator Model , 2013, IWOMP.

[13]  Sunita Chandrasekaran,et al.  SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance , 2014, PMBS@SC.

[14]  Stephen A. Jarvis,et al.  Achieving Portability and Performance through OpenACC , 2014, 2014 First Workshop on Accelerator Programming using Directives.

[15]  Spiros N. Agathos,et al.  Targeting the Parallella , 2015, Euro-Par.

[16]  Putt Sakdhnagool,et al.  Evaluating Performance Portability of OpenACC , 2014, LCPC.

[17]  Kevin O'Brien,et al.  Integrating GPU support for OpenMP offloading directives into Clang , 2015, LLVM '15.