Early Experiences Porting Three Applications to OpenMP 4.5

Many application developers need code that runs efficiently on multiple architectures, but cannot afford to maintain architecturally specific codes. With the addition of target directives to support offload accelerators, OpenMP now has the machinery to support performance portable code development. In this paper, we describe application ports of Kripke, Cardioid, and LULESH to OpenMP 4.5 and discuss our successes and failures. Challenges encountered include how OpenMP interacts with C++ including classes with virtual methods and lambda functions. Also, the lack of deep copy support in OpenMP increased code complexity. Finally, GPUs inability to handle virtual function calls required code restructuring. Despite these challenges we demonstrate OpenMP obtains performance within 10 % of hand written CUDA for memory bandwidth bound kernels in LULESH. In addition, we show with a minor change to the OpenMP standard that register usage for OpenMP code can be reduced by up to 10 %.

[1]  Matt Martineau,et al.  Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[2]  Ian Karlin,et al.  LULESH Programming Model and Performance Ports Overview , 2012 .

[3]  Bronis R. de Supinski,et al.  OpenMP for Accelerators , 2011, IWOMP.

[4]  Daniel Sunderland,et al.  Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..

[5]  Viatcheslav Gurev,et al.  Towards real-time simulation of cardiac electrophysiology in a human heart at high resolution , 2013, Computer methods in biomechanics and biomedical engineering.

[6]  Lifan Xu,et al.  Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).

[7]  Michael Garland,et al.  A collection-oriented programming model for performance portability , 2015, PPOPP.

[8]  Peter N. Brown,et al.  KRIPKE - A MASSIVELY PARALLEL TRANSPORT MINI-APP , 2015 .

[9]  Wu-chun Feng,et al.  Directive-based GPU programming for computational fluid dynamics , 2015 .

[10]  Matt Martineau,et al.  An Evaluation of Emerging Many-Core Parallel Programming Models , 2016, PMAM@PPoPP.

[11]  Seyong Lee,et al.  Early evaluation of directive-based GPU programming models for productive exascale computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Richard D. Hornung,et al.  The RAJA Portability Layer: Overview and Status , 2014 .

[13]  Kevin O'Brien,et al.  Performance analysis of OpenMP on a GPU using a CORAL proxy application , 2015, PMBS '15.