Tracking Performance Portability on the Yellow Brick Road to Exascale

With Exascale machines on our immediate horizon, there is a pressing need for applications to be made ready to best exploit these systems. However, there will be multiple paths to Exascale, with each system relying on processor and accelerator technologies from different vendors. As such, applications will be required to be portable between these different architectures, but it is also critical that they are efficient too. These double requirements for portability and efficiency begets the need for performance portability. In this study we survey the performance portability of different programming models, including the open standards OpenMP and SYCL, across the diverse landscape of Exascale and pre-Exascale processors from Intel, AMD, NVIDIA, Fujitsu, Marvell, and Amazon, together encompassing GPUs and CPUs based on both x86 and Arm architectures. We also take a historical view and analyse how performance portability has changed over the last year.

[1]  Simon McIntosh-Smith,et al.  On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures , 2014, ISC.

[2]  Simon McIntosh-Smith,et al.  GPU-STREAM: Benchmarking the achievable memory bandwidth of Graphics Processing Units , 2015, SC 2015.

[3]  Simon McIntosh-Smith,et al.  Performance Portability across Diverse Computer Architectures , 2019, 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC).

[4]  Victor W. Lee,et al.  A Metric for Performance Portability , 2016, ArXiv.

[5]  Vincent Heuveline,et al.  SYCL beyond OpenCL: The architecture, current state and future direction of hipSYCL , 2020, IWOCL.

[6]  J. Sewall,et al.  Interpreting and Visualizing Performance Portability Metrics , 2020, 2020 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC).

[7]  Daniel Sunderland,et al.  Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..

[8]  Daniela Ferreira Daniel,et al.  On Applying Performance Portability Metrics , 2019, 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC).

[9]  Matt Martineau,et al.  Evaluating attainable memory bandwidth of parallel programming models via BabelStream , 2018, Int. J. Comput. Sci. Eng..