Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs

Abstract Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge–Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge–Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%–35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5–2.7 × ) and Xeon Phi coprocessor (4.7–4.9 × ) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5–1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.

[1]  Seog Yeon Cho,et al.  Computation accuracy and efficiency of the time-splitting method in solving atmospheric transport/chemistry equations , 1997 .

[2]  Na Zhang,et al.  A multiple time stepping algorithm for efficient multiscale modeling of platelets flowing in blood plasma , 2015, J. Comput. Phys..

[3]  Michael Frenklach,et al.  Computational economy improvements in PRISM , 2003 .

[4]  Lawrence M. Murray,et al.  GPU Acceleration of Runge-Kutta Integrators , 2012, IEEE Transactions on Parallel and Distributed Systems.

[5]  Kyle E. Niemeyer,et al.  An investigation of GPU-based stiff chemical kinetics integration methods , 2016, ArXiv.

[6]  Kyle E. Niemeyer,et al.  Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs , 2013, J. Comput. Phys..

[7]  Song-Charng Kong,et al.  Development of a Semi-implicit Solver for Detailed Chemistry in Internal Combustion Engine Simulations , 2007 .

[8]  Fabian Sewerin,et al.  A methodology for the integration of stiff chemical kinetics on GPUs , 2015 .

[9]  William H. Green,et al.  Balanced Splitting and Rebalanced Splitting , 2013, SIAM J. Numer. Anal..

[10]  Kyle E. Niemeyer,et al.  pyJac: Analytical Jacobian generator for chemical kinetics , 2016, Comput. Phys. Commun..

[11]  C. Law,et al.  Toward accommodating realistic fuel chemistry in large-scale computations , 2009 .

[12]  M S Day,et al.  Numerical simulation of laminar reacting flows with complex chemistry , 2000 .

[13]  Habib N. Najm,et al.  Regular Article: A Semi-implicit Numerical Scheme for Reacting Flow , 1999 .

[14]  Raymond J. Spiteri,et al.  Efficient SIMD solution of multiple systems of stiff IVPs , 2013, J. Comput. Sci..

[15]  Christopher P. Stone,et al.  Comparison of ODE Solver for Chemical Kinetics and Reactive CFD Applications , 2014 .

[16]  Ray W. Grout,et al.  Accelerating S3D: A GPGPU Case Study , 2009, Euro-Par Workshops.

[17]  Zhuyin Ren,et al.  Second-order splitting schemes for a class of reactive systems , 2008, J. Comput. Phys..

[18]  Habib N. Najm,et al.  Operator-splitting with ISAT to model reacting flow with detailed chemistry , 2006 .

[19]  Xiaolong Gou,et al.  A dynamic multi-timescale method for combustion modeling with detailed and reduced chemical kinetic mechanisms , 2010 .

[20]  Ernst Hairer,et al.  Solving Ordinary Differential Equations I: Nonstiff Problems , 2009 .

[21]  Rolf D. Reitz,et al.  Acceleration of the chemistry solver for modeling DI engine combustion using dynamic adaptive chemistry (DAC) schemes , 2010 .

[22]  R. Lewis,et al.  Low-storage, Explicit Runge-Kutta Schemes for the Compressible Navier-Stokes Equations , 2000 .

[23]  Adrian Sandu,et al.  FATODE: a library for forward, adjoint and tangent linear integration of stiff systems , 2011, SpringSim.

[24]  Elaine S. Oran,et al.  Numerical Simulation of Reactive Flow , 1987 .

[25]  Oluwayemisi O. Oluwole,et al.  Accelerating multi-dimensional combustion simulations using GPU and hybrid explicit/implicit ODE integration , 2012 .

[26]  Carol S. Woodward,et al.  Enabling New Flexibility in the SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation Solvers , 2020, ACM Trans. Math. Softw..

[27]  Tiziano Faravelli,et al.  Numerical Modeling of Laminar Flames with Detailed Kinetics Based on the Operator-Splitting Method , 2013 .

[28]  P. S. Wyckoff,et al.  A Semi-implicit Numerical Scheme for Reacting Flow , 1998 .

[29]  Roger L. Davis,et al.  Techniques for Solving Stiff Chemical Kinetics on Graphical Processing Units , 2013 .

[30]  Marcus S. Day,et al.  Turbulence-chemistry interaction in lean premixed hydrogen combustion , 2015 .

[31]  C. Law,et al.  The effect of flame structure on soot formation and transport in turbulent nonpremixed flames using direct numerical simulation , 2007 .

[32]  K. Radhakrishnan,et al.  Comparison of numerical techniques for integration of stiff ordinary differential equations arising in combustion chemistry , 1984 .

[33]  F. Egolfopoulos,et al.  An optimized kinetic model of H2/CO combustion , 2005 .

[34]  D. Goodwin,et al.  Cantera: An Object-oriented Software Toolkit for Chemical Kinetics, Thermodynamics, and Transport Processes. Version 2.2.0 , 2015 .

[35]  E. Hairer,et al.  Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems , 2010 .

[36]  J. Verwer,et al.  Analysis of operator splitting for advection-diffusion-reaction problems from air pollution modelling , 1999 .

[37]  Cosmin Safta,et al.  TChem - A Software Toolkit for the Analysis of Complex Kinetic Models , 2020 .

[38]  C. Law,et al.  A directed relation graph method for mechanism reduction , 2005 .

[39]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[40]  Martin A. Reno,et al.  Coefficients for calculating thermodynamic and transport properties of individual species , 1993 .

[41]  Tianfeng Lu,et al.  Direct numerical simulation of soot formation and transport in non-premixed turbulent ethylene flames , 2007 .

[42]  W. Green,et al.  Redesigning combustion modeling algorithms for the Graphics Processing Unit (GPU): Chemical kinetic rate evaluation and ordinary differential equation integration , 2011 .

[43]  Niemeyer Kyle,et al.  Turbulence-chemistry closure method using graphics processing units: a preliminary test , 2011 .

[44]  Adrian Sandu,et al.  FATODE: A Library for Forward, Adjoint, and Tangent Linear Integration of ODEs , 2014, SIAM J. Sci. Comput..

[45]  Samuel Williams,et al.  ExaSAT: An exascale co-design tool for performance modeling , 2015, Int. J. High Perform. Comput. Appl..