Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

We present the basic idea, implementation, measured performance and performance model of FDPS (Framework for developing particle simulators). FDPS is an application-development framework which helps the researchers to develop particle-based simulation programs for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, redistribution of particles, and gathering of particle information for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as Barnes-Hut tree method should be used for long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are necessary. FDPS provides all of these necessary functions for efficient parallel execution of particle-based simulations as "templates", which are independent of the actual data structure of particles and the functional form of the interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N^2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speedup was obtained for up to the full system of K computer. The minimum calculation time per timestep is in the range of 30 ms (N=10^7) to 300 ms (N=10^9). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.

[1]  R W Hockney,et al.  Computer Simulation Using Particles , 1966 .

[2]  Erik Asphaug,et al.  Mercury and other iron-rich planetary bodies as relics of inefficient accretion , 2014 .

[3]  Joshua E. Barnes,et al.  A modified tree code: don't laugh; it runs , 1990 .

[4]  Guohong Xu A new parallel N body gravity solver: TPM , 1994, astro-ph/9409021.

[5]  Toshiyuki Fukushige,et al.  GreeM: Massively Parallel TreePM Code for Large Cosmological N-body Simulations , 2009, 0910.0121.

[6]  S. White,et al.  The Structure of cold dark matter halos , 1995, astro-ph/9508025.

[7]  John Dubinski,et al.  Equilibrium Disk-Bulge-Halo Models for the Milky Way and Andromeda Galaxies , 2005, astro-ph/0506177.

[8]  J. Peacock,et al.  Simulations of the formation, evolution and clustering of galaxies and quasars , 2005, Nature.

[9]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[10]  Rajiv Gupta,et al.  Fence Scoping , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  J. Makino,et al.  THE DYNAMICS OF SPIRAL ARMS IN PURE STELLAR DISKS , 2010, 1006.1228.

[12]  Jeroen Bédorf,et al.  A sparse octree gravitational N-body code that runs entirely on the GPU processor , 2011, J. Comput. Phys..

[13]  Jeremiah P. Ostriker,et al.  The Tree-particle-mesh N-body gravity solver , 1999 .

[14]  J. Monaghan Smoothed particle hydrodynamics , 2005 .

[15]  D. Balsara von Neumann stability analysis of smoothed particle hydrodynamics—suggestions for optimal algorithms , 1995 .

[16]  John Dubinski A parallel tree code , 1996 .

[17]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[18]  Toshikazu Ebisuzaki,et al.  A special-purpose computer for gravitational many-body problems , 1990, Nature.

[19]  Jack Dongarra,et al.  High Performance Computing for Computational Science — VECPAR 2002 , 2003, Lecture Notes in Computer Science.

[20]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[21]  Ataru Tanikawa,et al.  Phantom-GRAPE: Numerical software library to accelerate collisionless N-body simulation with SIMD instruction set on x86 architecture , 2012, 1203.4037.

[22]  Simon Portegies Zwart,et al.  SAPPORO: A way to turn your graphics cards into a GRAPE-6 , 2009, ArXiv.

[23]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[24]  Toshiyuki Fukushige,et al.  PPPM and TreePM Methods on GRAPE Systems for Cosmological N-body Simulations , 2005 .

[25]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[26]  J. Bagla TreePM: A code for cosmological N-body simulations , 1999, astro-ph/9911025.

[27]  Ataru Tanikawa,et al.  N-body simulation for self-gravitating collisional systems with a new SIMD instruction set extension to the x86 architecture, Advanced Vector eXtensions , 2012 .

[28]  Joseph John Monaghan,et al.  SPH and Riemann Solvers , 1997 .

[29]  John Shalf,et al.  The Cactus Framework and Toolkit: Design and Applications , 2002, VECPAR.

[30]  Toshiyuki Fukushige,et al.  GRAPE-6: Massively-Parallel Special-Purpose Computer for Astrophysical Particle Simulations , 2003, astro-ph/0310702.

[31]  D. Blackston,et al.  Highly Portable and Efficient Implementations of Parallel Adaptive N-Body Methods , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[32]  Satoshi Tanaka,et al.  Development of Hierarchical Domain Decomposition Explicit MPS Method and Application to Large-scale Tsunami Analysis with Floating Objects , 2014 .

[33]  John Dubinski,et al.  GOTPM: A Parallel Hybrid Particle-Mesh Treecode , 2004 .

[34]  Tomonari Masada,et al.  A novel multiple-walk parallel algorithm for the Barnes–Hut treecode on GPUs – towards cost effective, high performance N-body simulation , 2009, Computer Science - Research and Development.

[35]  Stephan Rosswog,et al.  Astrophysical smooth particle hydrodynamics , 2009, 0903.5075.

[36]  L. Hernquist,et al.  An Analytical Model for Spherical Galaxies and Bulges , 1990 .

[37]  V. Springel Smoothed Particle Hydrodynamics in Astrophysics , 2010, 1109.2219.

[38]  William K. Hartmann,et al.  Satellite-Sized Planetesimals and Lunar Origin , 1975 .

[39]  Michael S. Warren,et al.  A portable parallel particle program , 1995 .

[40]  Junichiro Makino,et al.  A Fast Parallel Treecode with GRAPE , 2004 .

[41]  Dehnen A Very Fast and Momentum-conserving Tree Code. , 2000, The Astrophysical journal.

[42]  Barry V. Hess,et al.  Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , 2010, HiPC 2010.

[43]  Amy C. Barr,et al.  Lunar-forming impacts: High-resolution SPH and AMR-CTH simulations , 2012 .

[44]  Junichiro Makino,et al.  Performance Tuning of N-Body Codes on Modern Microprocessors: I. Direct Integration with a Hermite Scheme on x86_64 Architecture , 2006 .

[45]  W. Dehnen,et al.  Improving convergence in smoothed particle hydrodynamics simulations without pairing instability , 2012, 1204.2471.