Parallel Processing of the Building-Cube Method on a GPU Platform

The Building-Cube Method (BCM) based on equally-spaced Cartesian meshes has been proposed as a next generation CFD method. Due to the equally-spaced meshes, it is well suited for highly parallel computation. This paper proposes a parallel implementation scheme of BCM on a GPU cluster system, which needs efficient hierarchical parallel processing to exploit the potential of the cluster system. The proposed scheme employs the Red-Black SOR method for the pressure calculations, which is the most time-consuming part of BCM, to obtain massive data parallelism of BCM. By exploiting the coarse-grain and fine-grain parallelism of BCM, the proposed scheme hierarchically assigns equally-divided tasks into the GPU cluster system. Furthermore, to exploit the computational power of GPUs in the cluster system, the proposed scheme employs an efficient data management such as coalesced data transfer and reusing data on an on-chip memory. Experimental results show that the single GPU implementation can achieve about three times higher performance than the single CPU one. Moreover, the multiple GPU implementation can achieve an almost ideal scalability. Finally, the possibility of further acceleration of not only the pressure calculation but also the whole BCM is discussed.

[1]  K. Stüben,et al.  Multigrid methods: Fundamental algorithms, model problem analysis and applications , 1982 .

[2]  J. Dukowicz,et al.  Approximate factorization as a high order splitting for the implicit incompressible flow equations , 1992 .

[3]  Tetuya Kawamura,et al.  Computation of high Reynolds number flow around a circular cylinder with surface roughness , 1984 .

[4]  K. Kuwahara,et al.  Streamline-coordinate finite-difference method for hot metal deformations , 1993 .

[5]  J. B. Perot,et al.  An analysis of the fractional step method , 1993 .

[6]  高橋 俊,et al.  Study of large scale simulation for unsteady flows , 2009 .

[7]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[8]  J. Kulpa,et al.  Time-frequency analysis using NVIDIA compute unified device architecture (CUDA) , 2009, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).

[9]  P. Moin,et al.  Application of a Fractional-Step Method to Incompressible Navier-Stokes Equations , 1984 .

[10]  Gabriel Wittum,et al.  Efficient Hierarchical Approximation of High-Dimensional Option Pricing Problems , 2007, SIAM J. Sci. Comput..

[11]  Kazuhiro Nakahashi,et al.  Study of High Resolution Incompressible Flow Simulation Based on Cartesian Mesh , 2009 .

[12]  Kazuhiro Nakahashi,et al.  High-Density Mesh Flow Computations with Pre-/Post-Data Compressions , 2005 .

[13]  Kazuhiro Nakahashi,et al.  Efficient and Robust Cartesian Mesh Generation for Building-Cube Method , 2008 .