Parallel block structured adaptive mesh refinement on graphics processing units.

Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.

[1]  Knut-Andreas Lie,et al.  Efficient GPU-Implementation of Adaptive Mesh Refinement for the Shallow-Water Equations , 2014, Journal of Scientific Computing.

[2]  Peng Wang,et al.  Adaptive mesh fluid simulations on GPU , 2009, 0910.5547.

[3]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[4]  M. Berger,et al.  Adaptive mesh refinement for hyperbolic partial differential equations , 1982 .

[5]  Simon D. Hammond,et al.  Benchmarking and modelling of POWER7, Westmere, BG/P, and GPUs: an industry case study , 2011, PERV.

[6]  G. Sod A survey of several finite difference methods for systems of nonlinear hyperbolic conservation laws , 1978 .

[7]  Jérôme Breil,et al.  A two-dimensional unstructured cell-centered multi-material ALE scheme using VOF interface reconstruction , 2010, J. Comput. Phys..

[8]  Scott R. Kohn,et al.  Managing application complexity in the SAMRAI object‐oriented framework , 2002, Concurr. Comput. Pract. Exp..

[9]  P. Colella,et al.  Local adaptive mesh refinement for shock hydrodynamics , 1989 .

[10]  Qingyu Meng,et al.  Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system , 2012, XSEDE '12.

[11]  Tzihong Chiueh,et al.  GAMER: A GRAPHIC PROCESSING UNIT ACCELERATED ADAPTIVE-MESH-REFINEMENT CODE FOR ASTROPHYSICS , 2009, 0907.3390.

[12]  Stephen A. Jarvis,et al.  CloverLeaf: Preparing Hydrodynamics Codes for Exascale , 2013 .

[13]  Qingyu Meng,et al.  Investigating applications portability with the uintah DAG-based runtime system on petascale supercomputers , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  Stephen A. Jarvis,et al.  Accelerating Hydrocodes with OpenACC, OpenCL and CUDA , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[15]  Stephen A. Jarvis,et al.  Towards Portable Performance for Explicit Hydrodynamics Codes , 2013 .

[16]  Timothy C. Warburton,et al.  Extreme-Scale AMR , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  P. Woodward,et al.  The numerical simulation of two-dimensional fluid flow with strong shocks , 1984 .

[18]  Craig A. Stewart,et al.  Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond , 2012 .