PKDGRAV3: beyond trillion particle cosmological simulations for the next era of galaxy surveys

We report on the successful completion of a 2 trillion particle cosmological simulation to z=0$z=0$ run on the Piz Daint supercomputer (CSCS, Switzerland), using 4000+ GPU nodes for a little less than 80 h of wall-clock time or 350,000 node hours. Using multiple benchmarks and performance measurements on the US Oak Ridge National Laboratory Titan supercomputer, we demonstrate that our code PKDGRAV3, delivers, to our knowledge, the fastest time-to-solution for large-scale cosmological N-body simulations. This was made possible by using the Fast Multipole Method in conjunction with individual and adaptive particle time steps, both deployed efficiently (and for the first time) on supercomputers with GPU-accelerated nodes. The very low memory footprint of PKDGRAV3 allowed us to run the first ever benchmark with 8 trillion particles on Titan, and to achieve perfect scaling up to 18,000 nodes and a peak performance of 10 Pflops.

[1]  R W Hockney,et al.  Computer Simulation Using Particles , 1966 .

[2]  Thomas L. Sterling,et al.  Pentium Pro Inside: I. A Treecode at 430 Gigaflops on ASCI Red, II. Price/Performance of $50/Mflop on Loki and Hyglac , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[3]  Jeroen Bédorf,et al.  Bonsai: A GPU Tree-Code , 2012, ArXiv.

[4]  R. Nichol,et al.  Euclid Definition Study Report , 2011, 1110.3193.

[5]  J. Stadel Cosmological N-body simulations and their analysis , 2001 .

[6]  Michael S. Warren 2HOT: An improved parallel hashed oct-tree N-Body algorithm for cosmological simulation , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[7]  Walter Dehnen,et al.  A Hierarchical O(N) Force Calculation Algorithm , 2002 .

[8]  Edward J. Wollack,et al.  First year Wilkinson Microwave Anisotropy Probe (WMAP) observations: Determination of cosmological parameters , 2003, astro-ph/0302209.

[9]  Junichiro Makino,et al.  4.45 Pflops astrophysical N-body simulation on K computer -- The gravitational trillion-body problem , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[10]  Toshiyuki Fukushige,et al.  GreeM: Massively Parallel TreePM Code for Large Cosmological N-body Simulations , 2009, 0910.0121.

[11]  Edward J. Wollack,et al.  Wide-Field InfraRed Survey Telescope-Astrophysics Focused Telescope Assets WFIRST-AFTA Final Report , 2013, 1305.5422.

[12]  J. Read,et al.  N-body simulations of gravitational dynamics , 2011, 1105.1082.

[13]  Michael S. Warren,et al.  A parallel hashed oct-tree N-body algorithm , 1993, Supercomputing '93. Proceedings.

[14]  C. A. Oxborrow,et al.  Planck 2013 results. XVI. Cosmological parameters , 2013, 1303.5076.

[15]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[16]  conferences,et al.  Advances in computational astrophysics : methods, tools and outcomes : proceedings of a conference held at cefalù, Italy, 13-17 June 2011 , 2012 .

[17]  Ralf S. Klessen GRAPESPH with Fully Periodic Boundaries: Fragmentation of Molecular Clouds , 1997 .

[18]  Matthew J. Turk,et al.  Dark Sky Simulations: Early Data Release , 2014, 1407.2600.

[19]  A. Brandt Multi-level adaptive technique (MLAT) for fast numerical solution to boundary value problems , 1973 .

[20]  R. Teyssier Cosmological hydrodynamics with adaptive mesh refinement - A new high resolution code called RAMSES , 2001, astro-ph/0111367.

[21]  F. Castander,et al.  The MICE Grand Challenge light-cone simulation – III. Galaxy lensing mocks from all-sky lensing maps , 2013, 1312.2947.

[22]  V. Springel The Cosmological simulation code GADGET-2 , 2005, astro-ph/0505010.

[23]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[24]  Joachim Stadel,et al.  Towards an accurate mass function for precision cosmology , 2012, 1206.5302.

[25]  V. Springel,et al.  Scaling relations for galaxy clusters in the Millennium-XXL simulation , 2012, 1203.3216.

[26]  Jeroen Bédorf,et al.  24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  Edward J. Wollack,et al.  Wide-Field InfrarRed Survey Telescope-Astrophysics Focused Telescope Assets WFIRST-AFTA 2015 Report , 2015, 1503.03757.

[28]  Xavier Delaruelle,et al.  DEUS Full Observable {\Lambda}CDM Universe Simulation: the numerical challenge , 2012, 1206.2838.

[29]  Joachim Stadel,et al.  Matter power spectrum and the challenge of percent accuracy , 2015, 1503.05920.

[30]  F. Pearce,et al.  Hydra: An Adaptive--Mesh Implementation of PPPM--SPH , 1994 .

[31]  L. Schipper,et al.  STRUCTURE OF THE COMA CLUSTER OF GALAXIES , 1985 .

[32]  Erik Holmberg,et al.  On the Clustering Tendencies among the Nebulae. II. a Study of Encounters Between Laboratory Models of Stellar Systems by a New Integration Procedure. , 1941 .

[33]  F. Bouchet,et al.  Application of the Ewald method to cosmological N-body simulations , 1991 .

[34]  Hal Finkel,et al.  HACC , 2016, Commun. ACM.

[35]  Hal Finkel,et al.  THE Q CONTINUUM SIMULATION: HARNESSING THE POWER OF GPU ACCELERATED SUPERCOMPUTERS , 2014, 1411.3396.

[36]  Hal Finkel,et al.  HACC: Simulating Sky Surveys on State-of-the-Art Supercomputing Architectures , 2014, 1410.2805.