Resident Block-Structured Adaptive Mesh Refinement on Thousands of Graphics Processing Units

Block-structured adaptive mesh refinement (AMR) is a technique that can be used when solving partial differential equations to reduce the number of cells necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a resident GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an 8 node cluster, and 4,196 nodes of Oak Ridge National Laboratory's Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and is scalable on 4,196 K20x GPUs using a combination of MPI and CUDA.

[1]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[2]  M. Berger,et al.  Adaptive mesh refinement for hyperbolic partial differential equations , 1982 .

[3]  Stephen A. Jarvis,et al.  Towards Portable Performance for Explicit Hydrodynamics Codes , 2013 .

[4]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[5]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[6]  Qingyu Meng,et al.  Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system , 2012, XSEDE '12.

[7]  David L. Foster,et al.  Accelerating Single Iteration Performance of CUDA-Based 3D Reaction–Diffusion Simulations , 2013, International Journal of Parallel Programming.

[8]  L. I. Stamov,et al.  Accelerated solution of problems of combustion gas dynamics on GPUs , 2014 .

[9]  Peng Wang,et al.  Adaptive mesh fluid simulations on GPU , 2009, 0910.5547.

[10]  Hong Wang,et al.  GPU computing of compressible flow problems by a meshless method with space-filling curves , 2014, J. Comput. Phys..

[11]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[12]  Timothy C. Warburton,et al.  Extreme-Scale AMR , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  J. Quirk A parallel adaptive grid algorithm for computational shock hydrodynamics , 1996 .

[14]  Jérôme Breil,et al.  A two-dimensional unstructured cell-centered multi-material ALE scheme using VOF interface reconstruction , 2010, J. Comput. Phys..

[15]  R. Anderson,et al.  An arbitrary Lagrangian-Eulerian method with adaptive mesh refinement for the solution of the Euler equations , 2004 .

[16]  Scott R. Kohn,et al.  Managing application complexity in the SAMRAI object‐oriented framework , 2002, Concurr. Comput. Pract. Exp..

[17]  Tzihong Chiueh,et al.  GAMER: A GRAPHIC PROCESSING UNIT ACCELERATED ADAPTIVE-MESH-REFINEMENT CODE FOR ASTROPHYSICS , 2009, 0907.3390.

[18]  Michael Engel,et al.  Massively parallel Monte Carlo for many-particle simulations on GPUs , 2012, J. Comput. Phys..

[19]  P. Colella,et al.  Local adaptive mesh refinement for shock hydrodynamics , 1989 .

[20]  B. Fryxell,et al.  FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[21]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[22]  Knut-Andreas Lie,et al.  Efficient GPU-Implementation of Adaptive Mesh Refinement for the Shallow-Water Equations , 2014, Journal of Scientific Computing.

[23]  Stephen A. Jarvis,et al.  CloverLeaf: Preparing Hydrodynamics Codes for Exascale , 2013 .

[24]  R. Teyssier,et al.  Numerical cosmology on the GPU with Enzo and Ramses , 2014, 1412.0934.

[25]  Greg L. Bryan,et al.  Fluids in the universe: adaptive mesh refinement in cosmology , 1999, Comput. Sci. Eng..

[26]  Qingyu Meng,et al.  Investigating applications portability with the uintah DAG-based runtime system on petascale supercomputers , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[27]  Stephen A. Jarvis,et al.  Accelerating Hydrocodes with OpenACC, OpenCL and CUDA , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.