CloverLeaf: Preparing Hydrodynamics Codes for Exascale

In this work we directly evaluate five candidate programming models for future exascale applications (MPI, MPI+OpenMP, MPI+OpenACC, MPI+CUDA and CAF) using a recently developed Lagrangian-Eulerian explicit hydrodynamics mini-application. The aim of this work is to better inform the exacsale planning at large HPC centres such as AWE. Such organisations invest significant resources maintaining and updating existing scientific codebases, many of which were not designed to run at the scale required to reach exascale levels of computation on future system architectures. We present our results and experiences of scaling these different approaches to high node counts on existing large-scale Cray systems (Titan and HECToR). We also examine the effect that improving the mapping between process layout and the underlying machine interconnect topology can have on performance and scalability, as well as highlighting several communication-focused optimisations.

[1]  Katie Antypas Running Large Scale Jobs on a Cray XE 6 System , 2012 .

[2]  Nectarios Koziris,et al.  Performance comparison of pure MPI vs hybrid MPI-OpenMP parallelization models on SMP clusters , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[3]  Michael Wolfe,et al.  The PGI Fortran and C 99 OpenACC Compilers , 2012 .

[4]  Barbara M. Chapman,et al.  Performance modeling of communication and computation in hybrid MPI and OpenMP applications , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[5]  Stephen A. Jarvis,et al.  Towards Portable Performance for Explicit Hydrodynamics Codes , 2013 .

[6]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[7]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[8]  M.D. Jones,et al.  Parallel programming for OSEM reconstruction with MPI, OpenMP, and hybrid MPI-OpenMP , 2004, IEEE Symposium Conference Record Nuclear Science 2004..

[9]  D. S. Henty,et al.  Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[10]  D. Henty Performance of Fortran Coarrays on the Cray XE 6 , 2012 .

[11]  V. Jandhyala,et al.  Enhanced hybrid MPI-OpenMP parallel electromagnetic simulations based on low-rank compressions , 2008, 2008 IEEE International Symposium on Electromagnetic Compatibility.

[12]  Jesper Larsson Träff,et al.  MPI on a Million Processors , 2009, PVM/MPI.

[13]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[14]  Dong Li,et al.  Hybrid MPI/OpenMP power-aware computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[15]  Stephen A. Jarvis,et al.  Accelerating Hydrocodes with OpenACC, OpenCL and CUDA , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[16]  Michelle Mills Strout,et al.  Evaluating Coarray Fortran with the CGPOP Miniapp , 2011 .

[17]  John Shalf,et al.  Multithreaded global address space communication techniques for Gyrokinetic fusion applications on ultra-scale platforms , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[18]  Rajkumar Sharma,et al.  Performance evaluation of MPI and hybrid MPI+OpenMP programming paradigms on multi-core processors cluster , 2011, 2011 International Conference on Recent Trends in Information Systems.

[19]  Franck Cappello,et al.  MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[20]  Kengo Nakajima,et al.  Flat MPI vs. Hybrid: Evaluation of Parallel Programming Models for Preconditioned Iterative Solvers on “T2K Open Supercomputer” , 2009, 2009 International Conference on Parallel Processing Workshops.

[21]  Ray W. Grout,et al.  Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Richard Barrett Co-Array Fortran Experiences with Finite Differencing Methods , 2006 .

[23]  G. Mahinthakumar,et al.  A Hybrid Mpi-Openmp Implementation of an Implicit Finite-Element Code on Parallel Architectures , 2002, Int. J. High Perform. Comput. Appl..

[24]  James Demmel,et al.  Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.