Algorithmic choices in WARP – A framework for continuous energy Monte Carlo neutron transport in general 3D geometries on GPUs

Abstract In recent supercomputers, general purpose graphics processing units (GPGPUs) are a significant faction of the supercomputer’s total computational power. GPGPUs have different architectures compared to central processing units (CPUs), and for Monte Carlo neutron transport codes used in nuclear engineering to take advantage of these coprocessor cards, transport algorithms must be changed to execute efficiently on them. WARP is a continuous energy Monte Carlo neutron transport code that has been written to do this. The main thrust of WARP is to adapt previous event-based transport algorithms to the new GPU hardware; the algorithmic choices for all parts of which are presented in this paper. It is found that remapping history data references increases the GPU processing rate when histories start to complete. The main reason for this is that completed data are eliminated from the address space, threads are kept busy, and memory bandwidth is not wasted on checking completed data. Remapping also allows the interaction kernels to be launched concurrently, improving efficiency. The OptiX ray tracing framework and CUDPP library are used for geometry representation and parallel dataset-side operations, ensuring high performance and reliability.

[1]  William R. Martin,et al.  VECTORIZATION AND PARALLELIZATION OF A PRODUCTION REACTOR ASSEMBLY CODE , 1991 .

[2]  Kan Wang,et al.  Research on GPU Acceleration for Monte Carlo Criticality Calculation , 2014, ICS 2014.

[3]  J. Leppänen Development of a New Monte Carlo reactor physics code , 2007 .

[4]  Benoit Forget,et al.  The OpenMC Monte Carlo particle transport code , 2012 .

[5]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[6]  Forrest B. Brown,et al.  A comparative study of history-based versus vectorized Monte Carlo methods in the GPU/CUDA environment for a simple neutron eigenvalue problem , 2014, ICS 2014.

[7]  AsanovićKrste,et al.  Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators , 2011 .

[8]  Pierre L'Ecuyer,et al.  Tables of linear congruential generators of different sizes and good lattice structure , 1999, Math. Comput..

[9]  Donald E. Knuth,et al.  Optimum binary search trees , 1971, Acta Informatica.

[10]  J. Leppänen Two practical methods for unionized energy grid construction in continuous-energy Monte Carlo neutron transport calculation , 2009 .

[11]  J. F. Briesmeister MCNP-A General Monte Carlo N-Particle Transport Code , 1993 .

[12]  Xipeng Shen,et al.  On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.

[13]  Adam Gregory Nelson,et al.  MONTE CARLO METHODS FOR NEUTRON TRANSPORT ON GRAPHICS PROCESSING UNITS USING CUDA , 2009 .

[14]  ShenXipeng,et al.  On-the-fly elimination of dynamic irregularities for GPU computing , 2011 .

[15]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[16]  Forrest B. Brown,et al.  Monte Carlo methods for radiation transport analysis on vector computers , 1984 .