Performance and Scalability of Hierarchical Hybrid Multigrid Solvers for Stokes Systems

In many applications involving incompressible fluid flow, the Stokes system plays an important role. Complex flow problems may require extremely fine resolutions, easily resulting in saddle-point problems with more than a trillion ($10^{12}$) unknowns. Even on the most advanced supercomputers, the fast solution of such systems of equations is a highly nontrivial and challenging task. In this work we consider a realization of an iterative saddle-point solver which is based mathematically on the Schur-complement formulation of the pressure and algorithmically on the abstract concept of hierarchical hybrid grids. The design of our fast multigrid solver is guided by an innovative performance analysis for the computational kernels in combination with a quantification of the communication overhead. Excellent node performance and good scalability to almost a million parallel threads are demonstrated on different characteristic types of modern supercomputers.

[1]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[2]  H. Langtangen,et al.  Mixed Finite Elements , 2003 .

[3]  S. MacLachlan,et al.  Scalable robust solvers for unstructured FE geodynamic modeling applications: Solving the Stokes equation for models with large localized viscosity contrasts , 2009 .

[4]  P. Ghysels,et al.  MODELING THE PERFORMANCE OF GEOMETRIC MULTIGRID ON MANY-CORE COMPUTER ARCHITECTURES , 2013 .

[5]  Edmond Chow,et al.  A Survey of Parallelization Techniques for Multigrid Solvers , 2006, Parallel Processing for Scientific Computing.

[6]  Anthony Skjellum,et al.  Portable Parallel Programming with the Message-Passing Interface , 1996 .

[7]  Anders Logg,et al.  FFC: the FEniCS Form Compiler , 2012 .

[8]  Amith R. Mamidala,et al.  Looking under the hood of the IBM Blue Gene/Q network , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Peter K. Jimack,et al.  Parallel Performance Prediction for Multigrid Codes on Distributed Memory Architectures , 2007, HPCC.

[10]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[11]  Wim Vanroose,et al.  Modeling the Performance of Geometric Multigrid Stencils on Multicore Computer Architectures , 2015, SIAM J. Sci. Comput..

[12]  Carsten Burstedde,et al.  p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees , 2011, SIAM J. Sci. Comput..

[13]  P. Frederickson,et al.  Icosahedral Discretization of the Two-Sphere , 1985 .

[14]  Gerhard Wellein,et al.  Exploring performance and power properties of modern multi‐core chips via simple machine models , 2012, Concurr. Comput. Pract. Exp..

[15]  Hari Sundar,et al.  Parallel geometric-algebraic multigrid on unstructured forests of octrees , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Ulrich Rüde,et al.  A Massively Parallel Multigrid Method for Finite Elements , 2006, Computing in Science & Engineering.

[17]  Ulrich Rüde,et al.  Optimization of the multigrid-convergence rate on semi-structured meshes by local Fourier analysis , 2013, Comput. Math. Appl..

[18]  Jonathan Joseph Hu,et al.  Design considerations for a flexible multigrid preconditioning library , 2012 .

[19]  Andreas Dedner,et al.  A generic grid interface for parallel and adaptive scientific computing. Part I: abstract framework , 2008, Computing.

[20]  Benjamin Karl Bergen,et al.  Hierarchical hybrid grids: data structures and core algorithms for multigrid , 2004, Numer. Linear Algebra Appl..

[21]  Barry Lee,et al.  Finite elements and fast iterative solvers: with applications in incompressible fluid dynamics , 2006, Math. Comput..

[22]  Martin Kronbichler,et al.  Algorithms and data structures for massively parallel generic adaptive finite element codes , 2011, ACM Trans. Math. Softw..

[23]  Ming Wang,et al.  Multigrid Methods for the Stokes Equations using Distributive Gauss–Seidel Relaxations based on the Least Squares Commutator , 2013, Journal of Scientific Computing.

[24]  R. Verfürth A combined conjugate gradient - multi-grid algorithm for the numerical solution of the Stokes problem , 1984 .

[25]  Mark Potse,et al.  Design and Analysis of a Lightweight Parallel Adaptive Scheme for the Solution of the Monodomain Equation , 2014, SIAM J. Sci. Comput..

[26]  Gerhard Wellein,et al.  Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.

[27]  Georg Stadler,et al.  Large-scale adaptive mantle convection simulation , 2013 .

[28]  Cyril Flaig,et al.  A scalable memory efficient multigrid solver for micro-finite element analyses based on CT images , 2011, Parallel Comput..

[29]  Jan-Philipp Weiss,et al.  Parallel Smoothers for Matrix-Based Geometric Multigrid Methods on Locally Refined Meshes Using Multicore CPUs and GPUs , 2011, Facing the Multicore-Challenge.

[30]  Manfred Liebmann,et al.  Algebraic Multigrid Solver on Clusters of CPUs and GPUs , 2010, PARA.

[31]  Christophe Geuzaine,et al.  Gmsh: A 3‐D finite element mesh generator with built‐in pre‐ and post‐processing facilities , 2009 .

[32]  Michel Fortin,et al.  Mixed Finite Elements, Compatibility Conditions, and Applications , 2008 .

[33]  Philip Heidelberger,et al.  The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[34]  Samuel Williams,et al.  Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .

[35]  Georg Hager,et al.  Introducing a Performance Model for Bandwidth-Limited Loop Kernels , 2009, PPAM.

[36]  Robert Scheichl,et al.  Massively parallel solvers for elliptic partial differential equations in numerical weather and climate prediction , 2013, ArXiv.

[37]  T. Hughes,et al.  Streamline upwind/Petrov-Galerkin formulations for convection dominated flows with particular emphasis on the incompressible Navier-Stokes equations , 1990 .

[38]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[39]  O. Pironneau,et al.  Error estimates for finite element method solution of the Stokes problem in the primitive variables , 1979 .

[40]  Christian Wieners,et al.  A geometric data structure for parallel finite elements and the application to multigrid methods with block smoothing , 2010, Comput. Vis. Sci..

[41]  T. Hughes,et al.  A new finite element formulation for computational fluid dynamics: V. Circumventing the Babuscka-Brezzi condition: A stable Petrov-Galerkin formulation of , 1986 .

[42]  Amith R. Mamidala,et al.  PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[43]  Ulrich Rüde,et al.  Parallel Geometric Multigrid , 2006 .

[44]  Maxim A. Olshanskii,et al.  An Iterative Method for the Stokes-Type Problem with Variable Viscosity , 2009, SIAM J. Sci. Comput..

[45]  Robert D. Falgout,et al.  The Design and Implementation of hypre, a Library of Parallel High Performance Preconditioners , 2006 .

[46]  Ulrich Rüde,et al.  Hierarchical Hybrid Grids for Mantle Convection: A First Study , 2012, 2012 11th International Symposium on Parallel and Distributed Computing.

[47]  Ulrich Rüde,et al.  Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters , 2014, Concurr. Comput. Pract. Exp..

[48]  Martin Kronbichler,et al.  High accuracy mantle convection simulation through modern numerical methods , 2012 .

[49]  Hans-Peter Bunge,et al.  Mantle convection modeling on parallel virtual machines , 1995 .

[50]  Christina Freytag,et al.  Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .

[51]  Ulrich Rüde,et al.  Cache-Aware Multigrid Methods for Solving Poisson's Equation in Two Dimensions , 2000, Computing.

[52]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[53]  Aslak Tveito,et al.  Numerical solution of partial differential equations on parallel computers , 2006 .