Efficient Strict-Binning Particle-in-Cell Algorithm for Multi-core SIMD Processors

Particle-in-Cell (PIC) codes are widely used for plasma simulations. On recent multi-core hardware, performance of these codes is often limited by memory bandwidth. We describe a multi-core PIC algorithm that achieves close-to-minimal number of memory transfers with the main memory, while at the same time exploiting SIMD instructions for numerical computations and exhibiting a high degree of OpenMP-level parallelism. Our algorithm keeps particles sorted by cell at every time step, and represents particles from a same cell using a linked list of fixed-capacity arrays, called chunks. Chunks support either sequential or atomic insertions, the latter being used to handle fast-moving particles. To validate our code, called Pic-Vert, we consider a 3d electrostatic Landau-damping simulation as well as a 2d3v transverse instability of magnetized electron holes. Performance results on a 24-core Intel Skylake hardware confirm the effectiveness of our algorithm, in particular its high throughput and its ability to cope with fast moving particles.

[1]  Hiroshi Nakashima,et al.  Large Scale Manycore-Aware PIC Simulation with Efficient Particle Binning , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[2]  Leonid Oliker,et al.  Extreme Scale Plasma Turbulence Simulations on Top Supercomputers Worldwide , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  K. Bowers,et al.  Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulationa) , 2008 .

[4]  François Faure,et al.  A Packed Memory Array to Keep Moving Particles Sorted , 2012, VRIPHYS.

[5]  R. Sasanka,et al.  An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes , 2016, Comput. Phys. Commun..

[6]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[7]  David Tskhakaya,et al.  Optimization of PIC codes by improved memory management , 2007, J. Comput. Phys..

[8]  R W Hockney,et al.  Computer Simulation Using Particles , 1966 .

[9]  Sergey Bastrakov,et al.  Load Balancing for Particle-in-Cell Plasma Simulation on Multicore Systems , 2017, PPAM.

[10]  Viktor K. Decyk,et al.  Particle-in-Cell algorithms for emerging computer architectures , 2014, Comput. Phys. Commun..

[11]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[12]  Roth,et al.  Transverse instability of magnetized electron holes , 2000, Physical review letters.

[13]  Liang Wang,et al.  The Plasma Simulation Code: A modern particle-in-cell code with patch-based load-balancing , 2016, J. Comput. Phys..

[14]  Sergey Bastrakov,et al.  Co-design of a Particle-in-Cell Plasma Simulation Code for Intel Xeon Phi: A First Look at Knights Landing , 2016, ICA3PP Workshops.

[15]  Michael C. Huang,et al.  Particle-in-cell simulations with charge-conserving current deposition on graphic processing units , 2010, J. Comput. Phys..

[16]  C. Birdsall,et al.  Plasma Physics via Computer Simulation , 2018 .

[17]  Lee F Ricketson,et al.  Sparse grid techniques for particle-in-cell schemes , 2016, 1607.06516.

[18]  Warren B. Mori,et al.  Exploiting multi-scale parallelism for large scale numerical modelling of laser wakefield accelerators , 2013, 1310.0930.

[19]  Arthur Charguéraud,et al.  A Space and Bandwidth Efficient Multicore Algorithm for the Particle-in-Cell Method , 2017, PPAM.

[20]  Laurent Villard,et al.  A Bucket Sort Algorithm for the Particle-In-Cell Method on Manycore Architectures , 2015, PPAM.