CPU–GPU hybrid parallel strategy for cosmological simulations

Gadget is a simulation application for N‐body and smoothed particle hydrodynamics problems in cosmology, and it is widely applied in solving series of cosmological problems. N‐body focuses on the motion of the interaction of N particles, and smoothed particle hydrodynamics is a fluid simulation algorithm that studies the movement of fluid through particle simulation. Most scholars focus their attention on accelerating Gadget on multi‐core CPU or graphics processing units (GPUs) platforms. However, these research activities failed to achieve CPU–GPU hybrid computing, which resulted in tremendous waste of CPU computing resources.

[1]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[2]  S. D. Hammond,et al.  Performance Analysis of a Hybrid MPI / CUDA Implementation of the NAS-LU Benchmark , 2010 .

[3]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[4]  R. Teyssier,et al.  REIONIZATION SIMULATIONS POWERED BY GRAPHICS PROCESSING UNITS. I. ON THE STRUCTURE OF THE ULTRAVIOLET RADIATION FIELD , 2010, 1004.2503.

[5]  Stephen A. Jarvis,et al.  Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark , 2011, PERV.

[6]  Wen-mei W. Hwu,et al.  MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs , 2008, LCPC.

[7]  Jérémie Allard,et al.  Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations , 2010, Euro-Par.

[8]  Michael D. McCool,et al.  Metaprogramming GPUs with Sh , 2004 .

[9]  Matthias Teschner,et al.  A Parallel SPH Implementation on Multi‐Core CPUs , 2011, Comput. Graph. Forum.

[10]  Inanc Senocak,et al.  An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters , 2010 .

[11]  Richard W. Vuduc,et al.  A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[12]  V. Springel,et al.  GADGET: a code for collisionless and gasdynamical cosmological simulations , 2000, astro-ph/0003162.

[13]  Makoto Taiji,et al.  42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[14]  Edinburgh,et al.  Simulating the joint evolution of quasars, galaxies and their large-scale distribution , 2005, astro-ph/0504097.

[15]  V. Springel The Cosmological simulation code GADGET-2 , 2005, astro-ph/0505010.

[16]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[17]  Eduard Ayguadé,et al.  An Extension of the StarSs Programming Model for Platforms with Multiple GPUs , 2009, Euro-Par.

[18]  Lexing Ying,et al.  A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, SC.

[19]  Mark Baker,et al.  MPJ Express Meets Gadget: Towards a Java Code for Cosmological Simulations , 2006, PVM/MPI.

[20]  Laxmikant V. Kalé,et al.  Scaling Hierarchical N-body Simulations on GPU Clusters , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.