A massively parallel, multi-disciplinary Barnes-Hut tree code for extreme-scale N-body simulations

The efficient parallelization of fast multipole-based algorithms for the N-body problem is one of the most challenging topics in high performance scientific computing. The emergence of non-local, irregular communication patterns generated by these algorithms can easily create an insurmountable bottleneck on supercomputers with hundreds of thousands of cores. To overcome this obstacle we have developed an innovative parallelization strategy for Barnes–Hut tree codes on present and upcoming HPC multicore architectures. This scheme, based on a combined MPI–Pthreads approach, permits an efficient overlap of computation and data exchange. We highlight the capabilities of this method on the full IBM Blue Gene/P system JUGENE at Julich Supercomputing Centre and demonstrate scaling across 294,912 cores with up to 2,048,000,000 particles. Applying our implementation pepc to laser–plasma interaction and vortex particle methods close to the continuum limit, we demonstrate its potential for ground-breaking advances in large-scale particle simulations.

[1]  Paul Gibbon,et al.  Tree-code simulations of proton acceleration from laser-irradiated wire targets , 2004 .

[2]  John Dubinski A parallel tree code , 1996 .

[3]  Martin Mašek,et al.  Progress in Mesh-Free Plasma Simulation With Parallel Tree Codes , 2010, IEEE Transactions on Plasma Science.

[4]  Resistively enhanced proton acceleration via high-intensity laser interactions with cold foil targets. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  V. Springel The Cosmological simulation code GADGET-2 , 2005, astro-ph/0505010.

[6]  Grégoire Winckelmans,et al.  Contributions to vortex particle methods for the computation of three-dimensional incompressible unsteady flows , 1993 .

[7]  I. Morozov,et al.  Collision frequency of electrons in laser excited small clusters , 2009 .

[8]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[9]  Ahmed F. Ghoniem,et al.  K-means clustering for optimal partitioning and dynamic load balancing of parallel hierarchical N-body simulations , 2005 .

[10]  M. S. Warren,et al.  A parallel hashed Oct-Tree N-body algorithm , 1993, Supercomputing '93.

[11]  U. Becciani,et al.  A Work- and Data-Sharing Parallel Tree N-body Code , 1996 .

[12]  Shang-Hua Teng,et al.  Provably Good Partitioning and Load Balancing Algorithms for Parallel Adaptive N-Body Simulation , 1998, SIAM J. Sci. Comput..

[13]  Lukas Arnold,et al.  Towards a petascale tree code: Scaling and efficiency of the PEPC library , 2011, J. Comput. Sci..

[14]  A. E. Dangor,et al.  Return current and proton emission from short pulse laser interactions with wire targets , 2004 .

[15]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[16]  S. Pfalzner,et al.  Direct calculation of inverse-bremsstrahlung absorption in strongly coupled, nonlinearly driven laser plasmas , 1998 .

[17]  H. Reinholz Dielectric and optical properties of dense plasmas , 2005 .

[18]  Michael S. Warren,et al.  A portable parallel particle program , 1995 .

[19]  DYNAMICAL SPATIALLY RESOLVED RESPONSE FUNCTION OF FINITE 1-D NANO PLASMAS , 2010 .

[20]  Michael S. Warren,et al.  Fast Parallel Tree Codes for Gravitational and Fluid Dynamical N-Body Problems , 1994, Int. J. High Perform. Comput. Appl..