The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units

We present an algorithm named "Chamomile Scheme". The scheme is fully optimized for calculating gravitational interactions on the latest programmable Graphics Processing Unit (GPU), NVIDIA GeForce8800GTX, which has (a) small but fast shared memories (16 K Bytes * 16) with no broadcasting mechanism and (b) floating point arithmetic hardware of 500 Gflop/s but only for single precision. Based on this scheme, we have developed a library for gravitational N-body simulations, "CUNBODY-1", whose measured performance reaches to 173 Gflop/s for 2048 particles and 256 Gflop/s for 131072 particles.

[1]  Joshua E. Barnes,et al.  A modified tree code: don't laugh; it runs , 1990 .

[2]  J. Makino,et al.  Collisional Stellar Dynamics, Gas Dynamics and Special Purpose Computing , 2002 .

[3]  L. Lucy A numerical approach to the testing of the fission hypothesis. , 1977 .

[4]  Toshikazu Ebisuzaki,et al.  A Highly Parallelized Special-Purpose Computer for Many-Body Simulations with an Arbitrary Central Force: MD-GRAPE , 1996 .

[5]  Toshikazu Ebisuzaki,et al.  A special-purpose computer for gravitational many-body problems , 1990, Nature.

[6]  Tsuyoshi Hamada,et al.  PROGRAPE-1: A Programmable, Multi-Purpose Computer for Many-Body Simulations , 2000 .

[7]  Toshikazu Ebisuzaki,et al.  GRAPE-4: A Massively Parallel Special-Purpose Computer for Collisional N-Body Simulations , 1997 .

[8]  Junichiro Makino,et al.  Performance Tuning of N-Body Codes on Modern Microprocessors: I. Direct Integration with a Hermite Scheme on x86_64 Architecture , 2006 .

[9]  Reinhard Männer,et al.  AHA-GRAPE: Adaptive Hydrodynamic Architecture - GRAvity PipE , 1999, PDPTA.

[10]  Toshiyuki Fukushige,et al.  GRAPE-6: Massively-Parallel Special-Purpose Computer for Astrophysical Particle Simulations , 2003, astro-ph/0310702.

[11]  Ryutaro Himeno,et al.  A 55 TFLOPS simulation of amyloid-forming peptides from yeast prion Sup35 with the special-purpose computer system MDGRAPE-3 , 2006, SC.

[12]  Makoto Taiji,et al.  Scientific simulations with special purpose computers - the GRAPE systems , 1998 .

[13]  Peter J. Teuben,et al.  The Stellar Dynamics Toolbox NEMO , 1995 .

[14]  Reinhard Männer,et al.  Using floating-point arithmetic on FPGAs to accelerate scientific N-Body simulations , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[15]  J. Haile Molecular Dynamics Simulation , 1992 .

[16]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[17]  Tsuyoshi Hamada,et al.  PGPG : An Automatic Generator of Pipeline Design for Programmable GRAPE Systems , 2005 .

[18]  T. Narumi,et al.  Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[19]  Junichiro Makino,et al.  Treecode with a Special-Purpose Processor , 1991 .

[20]  T. Takeuchi,et al.  Development of Special Purpose Computer for Cosmic Hydrodynamics with SPH Method , 2001 .

[21]  J. Makino,et al.  GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems , 2005, astro-ph/0504407.

[22]  J. Monaghan,et al.  Smoothed particle hydrodynamics: Theory and application to non-spherical stars , 1977 .

[23]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[24]  T. Ebisuzaki,et al.  Molecular Dynamics Machine: Special-Purpose Computer for Molecular Dynamics Simulations , 1999 .

[25]  Thomas L. Sterling,et al.  Pentium Pro Inside: I. A Treecode at 430 Gigaflops on ASCI Red, II. Price/Performance of $50/Mflop on Loki and Hyglac , 1997, ACM/IEEE SC 1997 Conference (SC'97).