A runtime system for finite element methods in a partitioned global address space

With approaching exascale performance, applications in the domain of high-performance computing (HPC) have to scale to an ever-increasing amount of compute nodes. The Global Address Space Programming Interface (GASPI) communication API promises to handle this challenge by providing a highly flexible and efficient programming model in a partitioned global address space (PGAS). Suitable applications targeting supercomputers include the domain of mesh-based solvers for partial differential equations (PDEs) due to their high computational intensity. The implementation of such solvers is highly interdisciplinary, which therefore requires an abstraction of hardware-specific parallelization techniques from developing numerical algorithms. We present an open-source run-time system (RTS) that distributes and parallelizes device-agnostic kernels, which define algorithms on unstructured grids. We describe how the RTS abstracts common parts of iterative solvers and further explain how to parallelize and distribute these components. We further show the efficiency of our approach for several microbenchmarks and an implementation of the discontinuous Galerkin method (DGM). The results show that we can almost completely hide all synchronization overhead and that the RTS only imposes a small computational cost.

[1]  Torsten Hoefler,et al.  Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[2]  Vincent Heuveline,et al.  Evaluation of the Global Address Space Programming Interface (GASPI) , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[3]  G. R. Mudalige,et al.  OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures , 2012, 2012 Innovative Parallel Computing (InPar).

[4]  Anders Logg,et al.  Unified form language: A domain-specific language for weak formulations of partial differential equations , 2012, TOMS.

[5]  Barry Hilary Valentine Topping,et al.  Finite Element Mesh Generation , 2002 .

[6]  Yang Wang,et al.  The Scalability-Efficiency/Maintainability-Portability Trade-Off in Simulation Software Engineering: Examples and a Preliminary Systematic Literature Review , 2016, 2016 Fourth International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering (SE-HPCCSE).

[7]  Barbara M. Chapman,et al.  Introducing OpenSHMEM: SHMEM for the PGAS community , 2010, PGAS '10.

[8]  James R. Stewart,et al.  A framework approach for developing parallel adaptive multiphysics applications , 2004 .

[9]  Vanessa End On Collective Communication and Notified Read in the Global Address Space Programming Interface (GASPI) , 2017 .

[10]  Andreas Dedner,et al.  The Distributed and Unified Numerics Environment,Version 2.4 , 2016 .

[11]  Jürgen Teich,et al.  Automating the Development of High-Performance Multigrid Solvers , 2018, Proceedings of the IEEE.

[12]  John Shalf,et al.  Trends in Data Locality Abstractions for HPC Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[13]  K. Bathe Finite Element Procedures , 1995 .

[14]  Mirko Rahn,et al.  The GASPI API: A Failure Tolerant PGAS API for Asynchronous Dataflow on Heterogeneous Architectures , 2015 .

[15]  Georg Hager,et al.  Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[16]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[17]  Emil M. Constantinescu,et al.  PETSc/TS: A Modern Scalable ODE/DAE Solver Library , 2018, 1806.01437.

[18]  Eric Darve,et al.  Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[19]  P. Marcal,et al.  Introduction to the Finite-Element Method , 1973 .

[20]  Rupak Biswas,et al.  High performance computing using MPI and OpenMP on multi-core parallel systems , 2011, Parallel Comput..

[21]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..