Toward efficient GPU-accelerated N-body simulations

N-body algorithms are applicable to a number of common problems in computational physics including gravitation, electrostatics, and fluid dynamics. Fast algorithms (those with better than O(N 2 ) performance) exist, but have not been successfully implemented on GPU hardware for practical problems. In the present work, we introduce not only best-in-class performance for a multipole-accelerated treecode method, but a series of improvements that support implementation of this solver on highly-data-parallel graphics processing units (GPUs). The greatly reduced computation times suggest that this problem is ideally suited for the current and next generations of single and cluster CPU-GPU architectures. We believe that this is an ideal method for practical computation of largescale turbulent flows on future supercomputing hardware using parallel vortex particle methods.

[1]  Toshikazu Ebisuzaki,et al.  A Highly Parallelized Special-Purpose Computer for Many-Body Simulations with an Arbitrary Central Force: MD-GRAPE , 1996 .

[2]  L. Greengard,et al.  Regular Article: A Fast Adaptive Multipole Algorithm in Three Dimensions , 1999 .

[3]  Mark J. Stock,et al.  3-D Vortex Simulation of Flow Over A Circular Disk at An Angle of Attack , 2005 .

[4]  Petros Koumoutsakos,et al.  Vortex Methods: Theory and Practice , 2000 .

[5]  P. Spinnato Hybrid Systems for N-body Simulations , 2003 .

[6]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[7]  Michael S. Warren,et al.  Application of fast parallel and sequential tree codes to computing three-dimensional flows with the vortex element and boundary element methods , 1996 .

[8]  Toshikazu Ebisuzaki,et al.  A special-purpose computer for gravitational many-body problems , 1990, Nature.

[9]  Junichiro Makino,et al.  Treecode with a Special-Purpose Processor , 1991 .

[10]  Raj Jain,et al.  Algorithms and strategies for similarity retrieval , 1996 .

[11]  Eric Darve,et al.  N-Body Simulations on GPUs , 2007, ArXiv.

[12]  S. Kupka Molecular dynamics on graphics accelerators , 2006 .

[13]  L. A. Gritzo,et al.  Fast Multipole Solvers for Three-Dimensional Radiation and Fluid Flow Problems , 1999 .

[14]  S. Shankar,et al.  A New Diffusion Procedure for Vortex Methods , 1996 .

[15]  Tsuyoshi Hamada,et al.  The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units , 2007 .

[16]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[17]  Qian Xi Wang,et al.  Variable order revised binary treecode , 2004 .

[18]  Toshiyuki Fukushige,et al.  N-Boday Simulation of Galaxy Formation on GRAPE-4 Special-Purpose Computer , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[19]  Mark J. Stock,et al.  A LAGRANGIAN VORTEX METHOD FOR SIMULATING FLOW OVER 3-D OBJECTS , 2005 .

[20]  Mohammad Faisal Siddiqui,et al.  Hierarchical N-Body problem on graphics processor unit , 2006 .

[21]  Shinnosuke Obi,et al.  Acceleration of Vortex Method Calculation using MDGRAPE-2: A special purpose computer , 2005 .

[22]  Junichiro Makino Yet Another Fast Multipole Method without Multipoles-Pseudoparticle Multipole Method , 1999 .

[23]  Mark J. Harris Mapping computational concepts to GPUs , 2005, SIGGRAPH Courses.

[24]  Ramani Duraiswami,et al.  Computing the dynamics of large multi-particle systems using Fast Multipole Method with multi-scale time stepping , 2006 .

[25]  John K. Salmon,et al.  Parallel hierarchical N-body methods , 1992 .

[26]  Joshua E. Barnes,et al.  A modified tree code: don't laugh; it runs , 1990 .

[27]  Jun Makino,et al.  Performance and accuracy of a GRAPE‐3 system for collisionless N‐body simulations , 1998 .

[28]  Robert G. Belleman,et al.  High Performance Direct Gravitational N-body Simulations on Graphics Processing Units , 2007, ArXiv.

[29]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[30]  Michael S. Warren,et al.  Skeletons from the treecode closet , 1994 .

[31]  Andrew W. Appel,et al.  An Efficient Program for Many-Body Simulation , 1983 .

[32]  Christopher R. Anderson,et al.  An Implementation of the Fast Multipole Method without Multipoles , 1992, SIAM J. Sci. Comput..