An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems

The numerical solution of two-layer shallow water systems is required to simulate accurately stratified fluids, which are ubiquitous in nature: they appear in atmospheric flows, ocean currents, oil spills, etc. Moreover, the implementation of the numerical schemes to solve these models in realistic scenarios imposes huge demands of computing power. In this paper, we tackle the acceleration of these simulations in triangular meshes by exploiting the combined power of several CUDA-enabled GPUs in a GPU cluster. For that purpose, an improvement of a path conservative Roe-type finite volume scheme which is specially suitable for GPU implementation is presented, and a distributed implementation of this scheme which uses CUDA and MPI to exploit the potential of a GPU cluster is developed. This implementation overlaps MPI communication with CPU-GPU memory transfers and GPU computation to increase efficiency. Several numerical experiments, performed on a cluster of modern CUDA-enabled GPUs, show the efficiency of the distributed solver.

[1]  Gordon Erlebacher,et al.  High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster , 2010, J. Comput. Phys..

[2]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[3]  Frank Mueller,et al.  Data-intensive document clustering on graphics processing unit (GPU) clusters , 2011, J. Parallel Distributed Comput..

[4]  Aslak Tveito,et al.  Numerical solution of partial differential equations on parallel computers , 2006 .

[5]  Ami Harten,et al.  Self adjusting grid methods for one-dimensional hyperbolic conservation laws☆ , 1983 .

[6]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[7]  Jostein R. Natvig,et al.  Simulation and visualization of the Saint-Venant system using GPUs , 2010, Comput. Vis. Sci..

[8]  Jeff Huskamp Proceedings of the 2004 ACM/IEEE conference on Supercomputing , 2004 .

[9]  M. J. Castro,et al.  A parallel 2d finite volume scheme for solving systems of balance laws with nonconservative products: Application to shallow flows , 2006 .

[10]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[11]  Martin Rumpf,et al.  Graphics Processor Units: New Prospects for Parallel Computing , 2006 .

[12]  B. Hendrickson The Chaco User � s Guide Version , 2005 .

[13]  José M. Mantas,et al.  GPU computing for shallow water flow simulation based on finite volume schemes , 2011 .

[14]  C. Parés Numerical methods for nonconservative hyperbolic systems: a theoretical framework. , 2006 .

[15]  José Miguel Mantas,et al.  Two-Dimensional Compact Third-Order Polynomial Reconstructions. Solving Nonconservative Hyperbolic Systems Using GPUs , 2011, J. Sci. Comput..

[16]  José Miguel Mantas,et al.  Simulation of one-layer shallow water systems on multicore and CUDA architectures , 2010, The Journal of Supercomputing.

[17]  Jostein R. Natvig,et al.  Visual simulation of shallow-water waves , 2005, Simul. Model. Pract. Theory.

[18]  Takayuki Aoki,et al.  Real-Time Tsunami Simulation on Multi-node GPU Cluster , 2009 .

[19]  Miguel Lastra,et al.  Simulation of shallow-water systems using graphics processing units , 2009, Math. Comput. Simul..

[20]  Dimitri Komatitsch,et al.  Fluid–solid coupling on a cluster of GPU graphics cards for seismic wave propagation , 2011 .

[21]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[22]  Henri Calandra,et al.  Fast seismic modeling and Reverse Time Migration on a GPU cluster , 2009, 2009 International Conference on High Performance Computing & Simulation.

[23]  José Miguel Mantas,et al.  Programming CUDA-Based GPUs to Simulate Two-Layer Shallow Water Flows , 2010, Euro-Par.

[24]  Jonathan Richard Shewchuk,et al.  Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator , 1996, WACG.

[25]  Inanc Senocak,et al.  An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters , 2010 .

[26]  Enrique D. Fernández-Nieto Modelling and numerical simulation of submarine sediment shallow flows: transport and avalanches , 2009 .

[27]  Inanc Senocak,et al.  Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms , 2010, The Journal of Supercomputing.

[28]  Carlos Parés,et al.  On the well-balance property of Roe?s method for nonconservative hyperbolic systems , 2004 .

[29]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[30]  Enrique D. Fernández-Nieto,et al.  A consistent intermediate wave speed for a well-balanced HLLC solver , 2008 .

[31]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[32]  Chin-Chuan Han,et al.  A GPU-Based Simulation of Tsunami Propagation and Inundation , 2009, ICA3PP.

[33]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[34]  Martin Lilleeng Sætra,et al.  Shallow Water Simulations on Multiple GPUs , 2010, PARA.

[35]  Manuel Jesús Castro Díaz,et al.  High Order Extensions of Roe Schemes for Two-Dimensional Nonconservative Hyperbolic Systems , 2009, J. Sci. Comput..

[36]  Dirk Ribbrock,et al.  A simulation suite for Lattice-Boltzmann based real-time CFD applications exploiting multi-level parallelism on modern multi- and many-core architectures , 2011, J. Comput. Sci..