Large Scale Manycore-Aware PIC Simulation with Efficient Particle Binning

We are now developing a manycore-aware implementation of multiprocessed PIC (particle-in-cell) simulation code with automatic load balancing. A key issue of the implementation is how to exploit the wide SIMD mechanism of manycore processors such as Intel Xeon Phi. Our solution is "particle binning" to rank all particles in a cell (voxel) in a chunk of SOA (structure-of-arrays) type one-dimensional arrays so that "particle-push" and "current-scatter" operations on them are efficiently SIMD-vectorized by our compiler. In addition, our sophisticated binning mechanism performs sorting of particles according to their positions "on-the-fly", efficiently coping with occasional "bin overflow" in a fully multithreaded manner. Our performance evaluation with up to 64 nodes of Cray XC30 and XC40 supercomputers, equipped with Xeon Phi 5120D (Knights Corner) and 7250 (Knights Landing) respectively, not only exhibited good parallel performance, but also proved the effectiveness of our binning mechanism.

[1]  Stephen Booth,et al.  HYDRA-MPI : An Adaptive Particle-Particle , Particle-Mesh code for conducting Cosmological Simulations on MPP Architectures , 2003 .

[2]  Viktor K. Decyk,et al.  UPIC: A framework for massively parallel particle-in-cell codes , 2007, Comput. Phys. Commun..

[3]  C. Cully,et al.  Plasma particle simulations of wake formation behind a spacecraft with thin wire booms , 2013 .

[4]  Viktor K. Decyk,et al.  Adaptable Particle-in-Cell algorithms for graphical processing units , 2010, Comput. Phys. Commun..

[5]  Sergey Bastrakov,et al.  Particle-in-Cell laser-plasma simulation on Xeon Phi coprocessors , 2015, Comput. Phys. Commun..

[6]  Scott Klasky,et al.  Efficient GPU Implementation for Particle in Cell Algorithm , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[7]  J. Cary,et al.  VORPAL: a versatile plasma simulation code , 2004 .

[8]  Vance Faber,et al.  Modeling the performance of hypercubes: a case study using the particle-in-cell application , 1988, Parallel Comput..

[9]  H Burau,et al.  PIConGPU: A Fully Relativistic Particle-in-Cell Code for a GPU Cluster , 2010, IEEE Transactions on Plasma Science.

[10]  J. Dawson Particle simulation of plasmas , 1983 .

[11]  Hiroshi Nakashima,et al.  Manycore challenge in particle-in-cell simulation: How to exploit 1 TFlops peak performance for simulation codes with irregular computation , 2015, Comput. Electr. Eng..

[12]  Samuel Williams,et al.  Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms , 2011, Parallel Comput..

[13]  Hiroshi Nakashima,et al.  Low-Cost Load Balancing for Parallel Particle-in-Cell Simulations with Thick Overlapping Layers , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[14]  David Tskhakaya,et al.  Optimization of PIC codes by improved memory management , 2007, J. Comput. Phys..

[15]  Hiroshi Nakashima,et al.  OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulations , 2009, ICS.

[16]  Kevin J. Bowers,et al.  Accelerating a paricle -in-cell simulation using a hybrid counting sort , 2001 .

[17]  Sergey Bastrakov,et al.  Particle-in-cell plasma simulation on heterogeneous cluster systems , 2012, J. Comput. Sci..

[18]  Warren B. Mori,et al.  Exploiting multi-scale parallelism for large scale numerical modelling of laser wakefield accelerators , 2013, 1310.0930.

[19]  K. Bowers,et al.  Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulationa) , 2008 .