Evaluating Abstract Asynchronous Schwarz solvers

With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel even on a single node with multiple co-processors such as GPUs and multiple cores on each node. For example, ORNLs Summit accumulates six NVIDIA Tesla V100s and 42 core IBM Power9s on each node. Synchronizing across all these compute resources in a single node or even across multiple nodes is prohibitively expensive. Hence it is necessary to develop and study asynchronous algorithms that circumvent this issue of bulk-synchronous computing for massive parallelism. In this study, we examine the asynchronous version of the abstract Restricted Additive Schwarz method as a solver where we do not explicitly synchronize, but allow for communication of the data between the sub-domains to be completely asynchronous thereby removing the bulk synchronous nature of the algorithm. We accomplish this by using the onesided RMA functions of the MPI standard. We study the benefits of using such an asynchronous solver over its synchronous counterpart on both multi-core architectures and on multiple GPUs. We also study the communication patterns and local solvers and their effect on the global solver. Finally, we show that this concept can render attractive runtime benefits over the synchronous counterparts.

[1]  Eric Blayo,et al.  Towards Optimized Schwarz Methods for the Navier–Stokes Equations , 2016, J. Sci. Comput..

[2]  Jack Dongarra,et al.  Performance of asynchronous optimized Schwarz with one-sided communication , 2019, Parallel Comput..

[3]  Gérard M. Baudet,et al.  Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[4]  Maksymilian Dryja,et al.  An Additive Variant of the Schwarz Alternating Method for the Case of Many Subregions , 2018 .

[5]  Frédéric Magoulès,et al.  Convergence of Asynchronous Optimized Schwarz Methods in the Plane , 2017 .

[6]  Martin J. Gander,et al.  Optimized Schwarz Methods for Maxwell's Equations , 2006, SIAM J. Sci. Comput..

[7]  Daniel B. Szyld,et al.  Asynchronous Iterations , 2011, Encyclopedia of Parallel Computing.

[8]  Ruipeng Li On Parallel Solution of Sparse Triangular Linear Systems in CUDA , 2017, ArXiv.

[9]  Daniel B. Szyld,et al.  An Algebraic Convergence Theory for Restricted Additive Schwarz Methods Using Weighted Max Norms , 2001, SIAM J. Numer. Anal..

[10]  Michele Benzi,et al.  Algebraic theory of multiplicative Schwarz methods , 2001, Numerische Mathematik.

[11]  A. J. M. van Gasteren,et al.  Derivation of a Termination Detection Algorithm for Distributed Computations , 1983, Inf. Process. Lett..

[12]  Nissim Francez,et al.  Distributed Termination , 1980, TOPL.

[13]  Alex Rapaport,et al.  Mpi-2: extensions to the message-passing interface , 1997 .

[14]  Frédéric Magoulès,et al.  Asynchronous iterative sub-structuring methods , 2018, Math. Comput. Simul..

[15]  James A. Kahle,et al.  2.1 Summit and Sierra: Designing AI/HPC Supercomputers , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[16]  Denis Foley,et al.  Ultra-Performance Pascal GPU and NVLink Interconnect , 2017, IEEE Micro.

[17]  Frédéric Magoulès,et al.  Asynchronous optimized Schwarz methods with and without overlap , 2017, Numerische Mathematik.

[18]  Yinnian He,et al.  Restricted Additive Schwarz Preconditioner for Elliptic Equations with Jump Coefficients , 2016 .

[19]  Pierre-Henri Tournier,et al.  Two-Level Preconditioners for the Helmholtz Equation , 2017 .

[20]  Friedemann Mattern,et al.  Algorithms for distributed termination detection , 1987, Distributed Computing.

[21]  Xiao-Chuan Cai,et al.  A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems , 1999, SIAM J. Sci. Comput..

[22]  David Wells,et al.  The deal.II library, Version 9.0 , 2018, J. Num. Math..

[23]  Daniel B. Szyld,et al.  Weighted max norms, splittings, and overlapping additive Schwarz iterations , 1999, Numerische Mathematik.

[24]  YANQING CHEN,et al.  Algorithm 8 xx : CHOLMOD , supernodal sparse Cholesky factorization and update / downdate ∗ , 2006 .

[25]  Daniel B. Szyld,et al.  Convergence of the multiplicative Schwarz method for singularly perturbed convection-diffusion problems discretized on a Shishkin mesh , 2018 .

[26]  Jacques M. Bahi,et al.  A decentralized convergence detection algorithm for asynchronous parallel iterative algorithms , 2005, IEEE Transactions on Parallel and Distributed Systems.

[27]  Frédéric Magoulès,et al.  Optimized Schwarz Method for Poisson’s Equation in Rectangular Domains , 2017 .

[28]  D. Szyld Different Models Of Parallel Asynchronous Iterations With Overlapping Blocks , 1998 .

[29]  Pedro C. Diniz Exascale Programming Challenges , 2011 .