Parallel implementation of hyper-dimensional dynamical particle system on CUDA

Abstract The presented paper deals with possible approaches to parallel implementation of solution of a hyper-dimensional dynamical particle system. The proposed implementation approaches are generally applicable for similar particle systems of interest in various research and engineering fields. The original motivation for the present work was a simulation of particles that represent a space-filling design to be optimized for further use in design of experiments. Due to the underlying purpose of this particle system, the dimension of the particle system of interest is considered to be entirely arbitrary. Such a hyper-dimensional space is further folded into a periodically repeated domain. The theoretical background of the proposed particle system is provided along with the derivation of equations of motion of the dynamical system. As the complexity of the system is not limited by the number of particles nor the number of dimensions, the possibilities of utilizing the GPGPU platform are more restricted in comparison with today’s fast parallel implementations of common particle systems. Two distinct approaches to parallel implementation are presented, one aiming at a generalized usage of the fast on-chip resources, the other entirely relying on the GPU’s on-board global memory. Despite unambiguous mutual differences in their performance, both parallel implementations deliver major speedup compared to the single-thread CPU solution as well as a better scaling of execution time when increasing both the number of particles and dimensions.

[1]  Miroslav Vořechovský,et al.  Improved formulation of Audze-Eglājs criterion for space-filling designs , 2015 .

[2]  Miroslav Vorechovský,et al.  Modification of the Audze-Eglājs criterion to achieve a uniform distribution of sampling points , 2016, Adv. Eng. Softw..

[3]  Miroslav Vořechovský,et al.  On the Influence of the Interaction Laws of a Dynamical Particle System for Sample Optimization , 2017 .

[4]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[5]  Martin Head-Gordon,et al.  Derivation and efficient implementation of the fast multipole method , 1994 .

[6]  Robert G. Belleman,et al.  High Performance Direct Gravitational N-body Simulations on Graphics Processing Units , 2007, ArXiv.

[7]  T. J. Mitchell,et al.  Exploratory designs for computational experiments , 1995 .

[8]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[9]  Tsuyoshi Hamada,et al.  The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units , 2007 .

[10]  Max Grossman,et al.  Professional CUDA C Programming , 2014 .

[11]  Vasily Volkov,et al.  Understanding Latency Hiding on GPUs , 2016 .

[12]  Jörg Peters,et al.  Swarm-NG: a CUDA Library for Parallel n-body Integrations with focus on Simulations of Planetary Systems , 2012, ArXiv.

[13]  Harald Niederreiter,et al.  Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[14]  M. E. Johnson,et al.  Minimax and maximin distance designs , 1990 .

[15]  Simon Portegies Zwart,et al.  SAPPORO: A way to turn your graphics cards into a GRAPE-6 , 2009, ArXiv.

[16]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[17]  Sebastian von Hoerner Die numerische Integration des n-Körper-Problemes für Sternhaufen. I , 1960 .

[18]  Chang-Xing Ma,et al.  Wrap-Around L2-Discrepancy of Random Sampling, Latin Hypercube and Uniform Designs , 2001, J. Complex..

[19]  D. Novák,et al.  CORRELATION CONTROL IN SMALL-SAMPLE MONTE CARLO TYPE SIMULATIONS I: A SIMULATED ANNEALING APPROACH , 2009 .