SPH-EXA: Enhancing the Scalability of SPH codes Via an Exascale-Ready SPH Mini-App

Numerical simulations of fluids in astrophysics and computational fluid dynamics (CFD) are among the most computationally-demanding calculations, in terms of sustained floating-point operations per second, or FLOP/s. It is expected that these numerical simulations will significantly benefit from the future Exascale computing infrastructures, that will perform 10^18 FLOP/s. The performance of the SPH codes is, in general, adversely impacted by several factors, such as multiple time-stepping, long-range interactions, and/or boundary conditions. In this work an extensive study of three SPH implementations SPHYNX, ChaNGa, and XXX is performed, to gain insights and to expose any limitations and characteristics of the codes. These codes are the starting point of an interdisciplinary co-design project, SPH-EXA, for the development of an Exascale-ready SPH mini-app. We implemented a rotating square patch as a joint test simulation for the three SPH codes and analyzed their performance on a modern HPC system, Piz Daint. The performance profiling and scalability analysis conducted on the three parent codes allowed to expose their performance issues, such as load imbalance, both in MPI and OpenMP. Two-level load balancing has been successfully applied to SPHYNX to overcome its load imbalance. The performance analysis shapes and drives the design of the SPH-EXA mini-app towards the use of efficient parallelization methods, fault-tolerance mechanisms, and load balancing approaches.

[1]  E. Tasker,et al.  A test suite for quantitative comparison of hydrodynamic codes in astrophysics , 2008, 0808.1844.

[2]  John M. Dennis,et al.  The CGPOP Miniapp , Version 1 . 0 , 2011 .

[3]  John Shalf,et al.  Memory Errors in Modern Systems: The Good, The Bad, and The Ugly , 2015, ASPLOS.

[4]  Hartmut Kaiser,et al.  HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.

[5]  Stephan Rosswog,et al.  Astrophysical smooth particle hydrodynamics , 2009, 0903.5075.

[6]  Unsal Osman,et al.  Unprotected Computing: A Large-Scale Study of DRAM Raw Error Rate on a Supercomputer , 2016 .

[7]  W. Marsden I and J , 2012 .

[8]  Florina M. Ciorba,et al.  Towards a Mini-App for Smoothed Particle Hydrodynamics at Exascale , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[9]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[10]  Albino Perego,et al.  Core-collapse supernovae in the hall of mirrors , 2018, Astronomy & Astrophysics.

[11]  Sandia Report,et al.  MiniGhost: A Miniapp for Exploring Boundary Exchange Strategies Using Stencil Computations in Scientific Parallel Computing , 2012 .

[12]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[13]  Franck Cappello,et al.  Toward Exascale Resilience: 2014 update , 2014, Supercomput. Front. Innov..

[14]  Joseph P. Kenny,et al.  Compiler-Assisted Source-to-Source Skeletonization of Application Models for System Simulation , 2018, ISC.

[15]  Danilo Guerrera,et al.  Reproducible Stencil Compiler Benchmarks Using PROVA! , 2016, 2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).

[16]  The CESAR Codesign Center: Early Results , 2012 .

[17]  Florina M. Ciorba,et al.  OpenMP Loop Scheduling Revisited: Making a Case for More Schedules , 2018, IWOMP.

[18]  Lukasz Wesolowski,et al.  Adaptive techniques for clustered N-body cosmological simulations , 2014, 1409.1929.

[19]  Holger Wendland,et al.  Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree , 1995, Adv. Comput. Math..

[20]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[21]  Scott Meyers,et al.  Effective C++: 55 Specific Ways to Improve Your Programs and Designs (3rd Edition) , 1991 .

[22]  Franck Cappello,et al.  Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..

[23]  J. Monaghan,et al.  Fundamental differences between SPH and grid methods , 2006, astro-ph/0610051.

[24]  Arch D. Robison,et al.  Composable Parallel Patterns with Intel Cilk Plus , 2013, Computing in Science & Engineering.

[25]  Salvatore Marrone,et al.  SPH accuracy improvement through the combination of a quasi-Lagrangian shifting transport velocity and consistent ALE formalisms , 2016, J. Comput. Phys..

[26]  Ioana Banicescu,et al.  Exploring Loop Scheduling Enhancements in OpenMP: An LLVM Case Study , 2019, 2019 18th International Symposium on Parallel and Distributed Computing (ISPDC).

[27]  Peter Bauer,et al.  Energy-efficient SCalable Algorithms for weather Prediction at Exascale , 2017 .

[28]  Peter Bauer,et al.  Atlas : A library for numerical weather prediction and climate modelling , 2017, Comput. Phys. Commun..

[29]  J. Monaghan,et al.  Shock simulation by the particle method SPH , 1983 .

[30]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[31]  Franck Cappello,et al.  Toward Exascale Resilience , 2009, Int. J. High Perform. Comput. Appl..

[32]  James Reinders,et al.  Intel® threading building blocks , 2008 .