Data decomposition in Monte Carlo neutron transport simulations using global view arrays

Accommodating large tally data can be a challenging problem for Monte Carlo neutron transport simulations. Current approaches include either simple data replication, or are based on application-controlled decomposition such as domain partitioning or client/server models, which are limited by either memory cost or performance loss. We propose and analyze an alternative solution based on global view arrays. By using global view arrays, tallies are naturally partitioned into small globally addressable blocks that fit in the limited on-node memory of compute nodes, achieving both highly scalable memory and performance efficiency. This approach also greatly simplifies the programmability compared with application-controlled approaches. Our implementation is based on integrating a global view library built on MPI one-sided communication, global view resilience (GVR), into the OpenMC Monte Carlo transport code. The remote memory access (RMA)-based global view array implementation is able to achieve 85% efficiency at 16,384 processes compared with 1,000 processes with 2.39 TB mesh tally across 1,366 nodes on a Cray XC30 supercomputer. Our results improve scalability significantly compared with the tally server approach and are better than any other published results, indicating that global view array is a promising alternative to enable full-core light water reactor analysis on current and future computer systems.

[1]  D. C. Carpenter,et al.  The MC21 Monte Carlo Transport Code , 2007 .

[2]  John A. Gunnels,et al.  Simulating solidification in metals at high pressure: The drive to petascale computing , 2006 .

[3]  William R. Martin,et al.  THE MONTE CARLO PERFORMANCE BENCHMARK TEST - AIMS, SPECIFICATIONS AND FIRST RESULTS , 2011 .

[4]  Andrew R. Siegel,et al.  Data decomposition of Monte Carlo particle transport simulations via tally servers , 2013, J. Comput. Phys..

[5]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[6]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[7]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[8]  Andrew R. Siegel,et al.  Analysis of communication costs for domain decomposed Monte Carlo methods in nuclear reactor analysis , 2012, J. Comput. Phys..

[9]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[10]  John S. Hendricks,et al.  Initial MCNP6 Release Overview , 2012 .

[11]  William R. Martin,et al.  CHALLENGES AND PROSPECTS FOR WHOLE-CORE MONTE CARLO ANALYSIS , 2012 .

[12]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[13]  Andrew A. Chien,et al.  Log-Structured Global Array for Efficient Multi-Version Snapshots , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[14]  Yousef Saad,et al.  A Flexible Inner-Outer Preconditioned GMRES Algorithm , 1993, SIAM J. Sci. Comput..

[15]  Benoit Forget,et al.  Direct Doppler broadening in Monte Carlo simulations using the multipole representation , 2014 .

[16]  Andrew R. Siegel,et al.  The effect of load imbalances on the performance of Monte Carlo algorithms in LWR analysis , 2012, J. Comput. Phys..

[17]  Andrew R. Siegel,et al.  Multi-core performance studies of a Monte Carlo neutron transport code , 2014, Int. J. High Perform. Comput. Appl..

[18]  Andrew A. Chien,et al.  When is multi-version checkpointing needed? , 2013, FTXS '13.

[19]  Andrew R. Siegel,et al.  Improved cache performance in Monte Carlo transport calculations using energy banding , 2014, Comput. Phys. Commun..

[20]  R J Procassini,et al.  ENHANCEMENTS TO THE COMBINATORIAL GEOMETRY PARTICLE TRACKER IN THE MERCURY MONTE CARLO TRANSPORT CODE: EMBEDDED MESHES AND DOMAIN DECOMPOSITION , 2009 .

[21]  Forrest B. Brown,et al.  High performance computing and Monte Carlo , 2004 .

[22]  G. Ivan Maldonado,et al.  VARIANCE ESTIMATION IN DOMAIN DECOMPOSED MONTE CARLO EIGENVALUE CALCULATIONS , 2012 .

[23]  Andrew Siegel,et al.  Memory Bottlenecks and Memory Contention in Multi-Core Monte Carlo Transport Codes , 2014, ICS 2014.

[24]  Paul K. Romano,et al.  Towards Scalable Parallelism in Monte Carlo Particle Transport Codes Using Remote Memory Access , 2011 .

[25]  Patrick S. Brantley,et al.  An efficient, robust, domain-decomposition algorithm for particle Monte Carlo , 2009, J. Comput. Phys..

[26]  Andrew Siegel,et al.  The energy band memory server algorithm for parallel Monte Carlo transport calculations , 2014, ICS 2014.

[27]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[28]  Andrew A. Chien,et al.  Error Checking and Snapshot-Based Recovery in a Preconditioned Conjugate Gradient Solver , 2013 .

[29]  Paul K. Romano,et al.  Parallel Algorithms for Monte Carlo Particle Transport Simulation on Exascale Computing Architectures , 2013 .

[30]  Andrew R. Siegel,et al.  Power Profiling of a Reduced Data Movement Algorithm for Neutron Cross Section Data in Monte Carlo Simulations , 2014, 2014 Hardware-Software Co-Design for High Performance Computing.

[31]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[32]  Benoit Forget,et al.  The OpenMC Monte Carlo particle transport code , 2012 .

[33]  E. Lewis,et al.  Computational Methods of Neutron Transport , 1993 .