Notified access in coarray-based hydrodynamics applications on many-core architectures: Design and performance

Abstract With the increasing availability of the Remote Direct Memory Access (RDMA) support in computer networks, the so called Partitioned Global Address Space (PGAS) model has evolved in the last few years. Although there are several cases where a PGAS approach can easily solve difficult message passing situations, like in particle tracking and adaptive mesh refinement applications, the producer-consumer pattern, usually adopted in task-based parallelism, can only be implemented inefficiently because of the separation between data transfer and synchronization (which is usually unified in message passing programming models). In this paper, we provide two contributions: (1) we propose an extension for the Fortran language that provides the concept of Notified Access by associating regular coarray variables with event variables. (2) We demonstrate that the MPI extension proposed by foMPI for Notified Access can be used effectively to implement the same concept in a PGAS run-time library like OpenCoarrays. Moreover, for a hydrodynamics mini-application, we found that Fortran 2018 events perform always better than Fortran 2008 sync statements on many-core processors. We finally show how the proposed Notified Access can improve the performance even more.

[1]  Dhabaleswar K. Panda,et al.  Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models , 2014, PGAS.

[2]  Timothy G. Mattson,et al.  Using the Parallel Research Kernels to Study PGAS Models , 2015, 2015 9th International Conference on Partitioned Global Address Space Programming Models.

[3]  Barbara M. Chapman,et al.  Introducing OpenSHMEM: SHMEM for the PGAS community , 2010, PGAS '10.

[4]  Mitsuhisa Sato,et al.  Audit: New Synchronization for the GET/PUT Protocol , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[5]  Dan Nagle,et al.  OpenCoarrays: Open-source Transport Layers Supporting Coarray Fortran Compilers , 2014, PGAS.

[6]  Alessandro Fanfarillo,et al.  CAF Events Implementation Using MPI-3 Capabilities , 2016, EuroMPI.

[7]  Torsten Hoefler,et al.  Remote Memory Access Programming in MPI-3 , 2015, TOPC.

[8]  Robert W. Numrich,et al.  Co-arrays in the next Fortran Standard , 2005, FORF.

[9]  Torsten Hoefler,et al.  Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[10]  Timothy G. Mattson,et al.  The Parallel Research Kernels , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[11]  C. Simmendinger,et al.  The GASPI API specification and its implementation GPI 2.0 , 2013 .

[12]  Torsten Hoefler,et al.  Enabling highly-scalable remote memory access programming with MPI-3 one sided , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[13]  Stan Smith,et al.  Reducing Synchronization Overhead Through Bundled Communication , 2014, OpenSHMEM.

[14]  Robert J. Harrison,et al.  Performance and experience with LAPI-a new high-performance communication library for the IBM RS/6000 SP , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[15]  Stephen A. Jarvis,et al.  Experiences at scale with PGAS versions of a Hydrodynamics application , 2014, PGAS.

[16]  Mirko Rahn,et al.  The GASPI API: A Failure Tolerant PGAS API for Asynchronous Dataflow on Heterogeneous Architectures , 2015 .

[17]  Bryan Carpenter,et al.  ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.

[18]  Michael Woodacre The SGI® Altix 3000 Global Shared-Memory Architecture , 2003 .

[19]  Jacob Nelson,et al.  Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels , 2016, ISC.