Radiative Heat Transfer Calculation on 16384 GPUs Using a Reverse Monte Carlo Ray Tracing Approach with Adaptive Mesh Refinement

Modeling thermal radiation is computationally challenging in parallel due to its all-to-all physical and resulting computational connectivity, and is also the dominant mode of heat transfer in practical applications such as next-generation clean coal boilers, being modeled by the Uintah framework. However, a direct all-to-all treatment of radiation is prohibitively expensive on large computers systems whether homogeneous or heterogeneous. DOE Titan and the planned DOE Summit and Sierra machines are examples of current and emerging GPU-based heterogeneous systems where the increased processing capability of GPUs over CPUs exacerbates this problem. These systems require that computational frameworks like Uintah leverage an arbitrary number of on-node GPUs, while simultaneously utilizing thousands of GPUs within a single simulation. We show that radiative heat transfer problems can be made to scale within Uintah on heterogeneous systems through a combination of reverse Monte Carlo ray tracing (RMCRT) techniques combined with AMR, to reduce the amount of global communication. In particular, significant Uintah infrastructure changes, including a novel lock and contention-free, thread-scalable data structure for managing MPI communication requests and improved memory allocation strategies were necessary to achieve excellent strong scaling results to 16384 GPUs on Titan.

[1]  Justin Luitjens,et al.  Dynamic task scheduling for the Uintah framework , 2010, 2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers.

[2]  Xiaojing Sun,et al.  A Parametric Case Study in Radiative Heat Transfer Using the Reverse Monte-Carlo Ray-Tracing With Full-Spectrum k-Distribution Method , 2010 .

[3]  Gautham Krishnamoorthy,et al.  PARALLEL COMPUTATIONS OF RADIATIVE HEAT TRANSFER USING THE DISCRETE ORDINATES METHOD , 2004 .

[4]  Qingyu Meng,et al.  The uintah framework: a unified heterogeneous task scheduling and runtime system , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[5]  Todd Harman,et al.  Efficient Parallelization of RMCRT for Large Scale LES Combustion Simulations , 2011 .

[6]  J. Freud Theory Of Reflectance And Emittance Spectroscopy , 2016 .

[7]  C. J. Clouse,et al.  Parallel Deterministic Neutron Transport with AMR , 2005 .

[8]  Christon,et al.  Spatial domain-based parallelism in large scale, participating-media, radiative transport applications , 1996 .

[9]  Paul E. Plassmann,et al.  Parallel Load Balancing Heuristics for Radiative Heat Transfer Calculations , 2006, CSC.

[10]  Jeroen Bédorf,et al.  24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  B. Fryxell,et al.  FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[12]  J. E. Guilkey,et al.  An Eulerian-Lagrangian Approach For Large Deformation Fluid StructureInteraction Problems, Part 1 : Algorithm Development , 2003 .

[13]  Michael F. Modest,et al.  Backward Monte Carlo Simulations in Radiative Heat Transfer , 2003 .

[14]  Jeremy N. Thornock,et al.  Large Eddy Simulation of Pulverized Coal Jet Flame Ignition Using the Direct Quadrature Method of Moments , 2012 .

[15]  P. Colella,et al.  An Adaptive Mesh Refinement Algorithm for the Radiative Transport Equation , 1998 .

[16]  Daniel Sunderland,et al.  Kokkos Array performance-portable manycore programming model , 2012, PMAM '12.

[17]  Martin Berzins,et al.  A Scalable Algorithm for Radiative Heat Transfer Using Reverse Monte Carlo Ray Tracing , 2015, ISC.

[18]  D. Sulsky Erratum: Application of a particle-in-cell method to solid mechanics , 1995 .

[19]  Qingyu Meng,et al.  Scalable large‐scale fluid–structure interaction solvers in the Uintah framework via hybrid task‐based parallelism algorithms , 2014, Concurr. Comput. Pract. Exp..

[20]  Martin Berzins,et al.  Status of Release of the Uintah Computational Framework , 2012 .

[21]  Easwaran Raman,et al.  Feedback directed optimization of TCMalloc , 2014, MSPC@PLDI.

[22]  G. Bryan,et al.  Introducing Enzo, an AMR Cosmology Application , 2004, astro-ph/0403044.

[23]  S. Pope Turbulent Flows: FUNDAMENTALS , 2000 .

[24]  Xun Jia,et al.  GPU-based Monte Carlo radiotherapy dose calculation using phase-space sources. , 2013, Physics in medicine and biology.

[25]  Kevin Stratford,et al.  Ludwig: multiple GPUs for a complex fluid lattice Boltzmann application , 2013 .

[26]  Qingyu Meng,et al.  Using hybrid parallelism to improve memory use in the Uintah framework , 2011 .

[27]  Gautham Krishnamoorthy,et al.  Parallelization of the P-1 Radiation Model , 2006 .

[28]  Justin Luitjens,et al.  Improving the performance of Uintah: A large-scale adaptive meshing computational framework , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[29]  Qingyu Meng,et al.  Investigating applications portability with the uintah DAG-based runtime system on petascale supercomputers , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[30]  Michael Pernice,et al.  Solution of Equilibrium Radiation Diffusion Problems Using Implicit Adaptive Mesh Refinement , 2005, SIAM J. Sci. Comput..

[31]  Paul E. Plassmann,et al.  Scalable Photon Monte Carlo Algorithms and Software for the Solution of Radiative Heat Transfer Problems , 2005, HPCC.

[32]  Qingyu Meng,et al.  Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system , 2012, XSEDE '12.

[33]  Jennifer Spinti,et al.  Large eddy simulations of accidental fires using massively parallel computers , 2003 .

[34]  Philip J. Smith,et al.  Heat Transfer To Objects In Pool Fires , 2008 .

[35]  I. V. Sokolov,et al.  CRASH: A BLOCK-ADAPTIVE-MESH CODE FOR RADIATIVE SHOCK HYDRODYNAMICS—IMPLEMENTATION AND VERIFICATION , 2011, 1101.3758.

[36]  Chi-Wang Shu,et al.  High order time discretization methods with the strong stability property , 2001 .

[37]  Robert D. Falgout,et al.  The Design and Implementation of hypre, a Library of Parallel High Performance Preconditioners , 2006 .

[38]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.