A scalable weakly-synchronous algorithm for solving partial differential equations

Synchronization overheads pose a major challenge as applications advance towards extreme scales. In current large-scale algorithms, synchronization as well as data communication delay the parallel computations at each time step in a time-dependent partial differential equation (PDE) solver. This creates a new scaling wall when moving towards exascale. We present a weakly-synchronous algorithm based on novel asynchrony-tolerant (AT) finite-difference schemes that relax synchronization at a mathematical level. We utilize remote memory access programming schemes that have been shown to provide significant speedup on modern supercomputers, to efficiently implement communications suitable for AT schemes, and compare to two-sided communications that are state-of-practice. We present results from simulations of Burgers' equation as a model of multi-scale strongly non-linear dynamical systems. Our algorithm demonstrate excellent scalability of the new AT schemes for large-scale computing, with a speedup of up to $3.3$x in communication time and $2.19$x in total runtime. We expect that such schemes can form the basis for exascale PDE algorithms.

[1]  Torsten Hoefler,et al.  Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[2]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[3]  Rajeev Thakur,et al.  Revealing the Performance of MPI RMA Implementations , 2007, PVM/MPI.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Amir Averbuch,et al.  On parallel asynchronous high-order solutions of parabolic PDEs , 1996, Numerical Algorithms.

[6]  James Demmel,et al.  Communication-optimal Parallel and Sequential Cholesky Decomposition , 2009, SIAM J. Sci. Comput..

[7]  Diego Donzis,et al.  The Turbulent Schmidt Number , 2014 .

[8]  Amir Averbuch,et al.  Implicit-Explicit Parallel Asynchronous Solver of Parabolic PDEs , 1998, SIAM J. Sci. Comput..

[9]  Aditya Konduri,et al.  Poster: Asynchronous Computing for Partial Differential Equations at Extreme Scales , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[10]  Cosmin Safta,et al.  Fault Resilient Domain Decomposition Preconditioner for PDEs , 2015, SIAM J. Sci. Comput..

[11]  James Demmel,et al.  Communication avoiding Gaussian elimination , 2008, HiPC 2008.

[12]  Valentin Goranko,et al.  Expressiveness ∗ , 2022 .

[13]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[14]  Torsten Hoefler,et al.  Message progression in parallel computing - to thread or not to thread? , 2008, 2008 IEEE International Conference on Cluster Computing.

[15]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[16]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[17]  John Mellor-Crummey,et al.  Expressiveness, programmability and portable high performance of global address space languages , 2007 .

[18]  Jack J. Dongarra,et al.  Dynamically Balanced Synchronization-Avoiding LU Factorization with Multicore and GPUs , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[19]  Tianfeng Lu,et al.  Direct numerical simulation of flame stabilization assisted by autoignition in a reheat gas turbine combustor , 2019, Proceedings of the Combustion Institute.

[20]  Myoungkyu Lee,et al.  Petascale direct numerical simulation of turbulent channel flow on up to 786K cores , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[21]  Santosh Ansumali,et al.  Delayed difference scheme for large scale scientific simulations. , 2014, Physical review letters.

[22]  Yuki Minamoto,et al.  DNS of a turbulent lifted DME jet flame , 2016 .

[23]  Aditya Konduri,et al.  High-order asynchrony-tolerant finite difference schemes for partial differential equations , 2017, J. Comput. Phys..

[24]  D. Donzis,et al.  Fluctuations of thermodynamic variables in stationary compressible turbulence , 2013, Journal of Fluid Mechanics.

[25]  Ray W. Grout,et al.  Achieving algorithmic resilience for temporal integration through spectral deferred corrections , 2015, ArXiv.

[26]  Sharath Girimaji,et al.  Proxy-equation paradigm: A strategy for massively parallel asynchronous computations. , 2017, Physical review. E.

[27]  Aditya Konduri,et al.  Asynchronous finite-difference schemes for partial differential equations , 2014, J. Comput. Phys..

[28]  D. Szyld,et al.  On asynchronous iterations , 2000 .

[29]  M. Snir,et al.  Ghost Cell Pattern , 2010, ParaPLoP '10.