High Performance Direct Gravitational N-body Simulations on Graphics Processing Units

We present the results of gravitational direct N-body simulations using the graphics processing unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the N-body problem is implemented in ‘‘Compute Unified Device Architecture’’ (CUDA) using the GPU to speedup the calculations. We tested the implementation on three different N-body codes: two direct N-body integration codes, using the 4th order predictor–corrector Hermite integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The integration of the equations of motions for all codes is performed on the host CPU. We find that for N > 512 particles the GPU outperforms the GRAPE-6Af, if some softening in the force calculation is accepted. Without softening and for very small integration time-steps the GRAPE still outperforms the GPU. We conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special purpose hardware. Using the same time-step criterion, the total energy of the N-body system was conserved better than to one in 10 6 on the GPU, only about an order of magnitude worse than obtained with GRAPE-6Af. For N J 10 5 the 8800GTX outperforms the host CPU by a factor of about 100 and runs at about the same speed as the GRAPE-6Af.

[1]  William R. Mark,et al.  Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[2]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[3]  Junichiro Makino,et al.  High-Performance Small-Scale Simulation of Star Clusters Evolution on Cray XD1 , 2007 .

[4]  A. Gualandris,et al.  Simulating self-gravitating systems on parallel computers , 2006 .

[5]  Makoto Taiji,et al.  Scientific simulations with special purpose computers - the GRAPE systems , 1998 .

[6]  Eric Darve,et al.  N-Body Simulations on GPUs , 2007, ArXiv.

[7]  R. Spurzem Direct Simulation of Dense Stellar Systems with GRAPE-6 , 2000 .

[8]  Roberto Scopigno,et al.  Computer Graphics forum , 2003, Computer Graphics Forum.

[9]  Junichiro Makino,et al.  Performance Tuning of N-Body Codes on Modern Microprocessors: I. Direct Integration with a Hermite Scheme on x86_64 Architecture , 2006 .

[10]  Peter Geldof Generic Computing on a Graphics Processing Unit , 2007 .

[11]  Tsuyoshi Hamada,et al.  The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units , 2007 .

[12]  Hsi-Yu Schive,et al.  Graphic-card cluster for astrophysics (GraCCA) - Performance tests , 2007, 0707.2991.

[13]  Manolis Plionis Dynamical Evolution of Clusters of Galaxies , 2003 .

[14]  Stephen L. W. McMillan,et al.  An O(N log N) integration scheme for collisional stellar systems , 1993 .

[15]  Matt Pharr,et al.  Gpu gems 2: programming techniques for high-performance graphics and general-purpose computation , 2005 .

[16]  Barbara Horner-Miller,et al.  Proceedings of the 2006 ACM/IEEE conference on Supercomputing , 2006 .

[17]  John Owens,et al.  Streaming architectures and technology trends , 2005, SIGGRAPH Courses.

[18]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[19]  S. J. Aarseth,et al.  Dynamical Evolution of Clusters Of Galaxies, II , 1963 .

[20]  Stephen L. W. McMillan,et al.  The Use of Supercomputers in Stellar Dynamics: Proceedings of a Workshop Held at the Institute for Advanced Study Princeton, Usa, June 2-4, 1986 , 1986 .

[21]  Simon Portegies Zwart,et al.  High-performance direct gravitational N-body simulations on graphics processing units , 2007, astro-ph/0702058.

[22]  A. Haslett Electronics , 1948 .

[23]  J. Makino,et al.  GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems , 2005, astro-ph/0504407.

[24]  Junichiro Makino,et al.  A Modified Aarseth Code for GRAPE and Vector Processors , 1991 .

[25]  Randima Fernando,et al.  The CG Tutorial: The Definitive Guide to Programmable Real-Time Graphics , 2003 .

[26]  H. Plummer On the Problem of Distribution in Globular Star Clusters: (Plate 8.) , 1911 .

[27]  Junichiro Makino,et al.  A Fast Parallel Treecode with GRAPE , 2004 .

[28]  Jim X. Chen,et al.  OpenGL Shading Language , 2009 .

[29]  A. Arnold,et al.  Harvesting graphics power for MD simulations , 2007, 0709.3225.

[30]  S. Aarseth Direct methods for N-Body simulations , 1994 .

[31]  Piet Hut,et al.  Use of Supercomputers in Stellar Dynamics , 1986 .

[32]  A. H. Jarrett,et al.  A New Astronomy , 1898, Nature.

[33]  Junichiro Makino,et al.  Star cluster ecology-IV. Dissection of an open star cluster: photometry , 2001 .

[34]  David Blythe The Direct3D 10 system , 2006, ACM Trans. Graph..

[35]  Andreas Just,et al.  Dynamics of Star Clusters and the Milky Way , 2001 .

[36]  Junichiro Makino,et al.  On a Hermite Integrator with Ahmad-Cohen Scheme for Gravitational Many-Body Problems , 1992 .

[37]  Randi J. Rost OpenGL shading language , 2004 .

[38]  Randima Fernando,et al.  GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics , 2004 .

[39]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[40]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..