Multi-GPU Implementation of 2D Shallow Water Equation Code with Block Uniform Quad-Tree Grids

This paper presents a multi Graphic Processing Unit (GPU) implementation of a 2D shallow water equations solver which is able to exploit the computational power of modern HPC clusters equipped with several GPUs on different nodes. The domain has been discretized by means of a Block Uniform Quadtree (BUQ) grid which allows to efficiently introduce variable resolution in a GPU-accelerated finite value code. In the present work the BUQ grid is decomposed into different partitions, and each partition is assigned to a dedicated GPU. Communications between different partitions are then handled by means of a Message Passing Interface (MPI) protocol. Computations and communications have been overlapped to reduce the overheads of the multi-GPU implementation. The strong scalability test shows an efficiency dropdown better than linear in the number of GPUs adopted by the simulation, and the weak scalability test shows that network overheads caused by border communication are completely maskable by GPU calculations.