Improving multigrid performance for unstructured mesh drift–diffusion simulations on 147,000 cores

SUMMARY This study considers the scaling of three algebraic multigrid aggregation schemes for a finite element discretization of a drift–diffusion system, specifically the drift–diffusion model for semiconductor devices. The approach is more general and can be applied to other systems of partial differential equations. After discretization on unstructured meshes, a fully coupled multigrid preconditioned Newton–Krylov solution method is employed. The choice of aggregation scheme for generating coarser levels has a significant impact on the performance and scalability of the multigrid preconditioner. For the test cases considered, the uncoupled aggregation scheme, which aggregates/combines the immediate neighbors, followed by repartitioning and data redistribution for the coarser level matrices on a subset of the Message Passing Interface (MPI) processes, outperformed the two other approaches, including the baseline aggressive coarsening scheme. Scaling results are presented up to 147,456 cores on an IBM Blue Gene/P platform. A comparison of the scaling of a multigrid V-cycle and W-cycle is provided. Results for 65,536 cores demonstrate that a factor of 3.5 × reduction in time between the uncoupled aggregation and baseline aggressive coarsening scheme can be obtained by significantly reducing the iteration count due to the increased number of multigrid levels and the generation of better quality aggregates. Copyright © 2012 John Wiley & Sons, Ltd.

[1]  W. Wall,et al.  Truly monolithic algebraic multigrid for fluid–structure interaction , 2011 .

[2]  William L. Briggs,et al.  A multigrid tutorial, Second Edition , 2000 .

[3]  Paul Lin,et al.  Performance of fully coupled domain decomposition preconditioners for finite element transport/reaction simulations , 2005 .

[4]  Sivasankaran Rajamanickam,et al.  ShyLU: A Hybrid-Hybrid Solver for Multicore Platforms , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[5]  Thomas A. Manteuffel,et al.  Towards Adaptive Smoothed Aggregation (AlphaSA) for Nonsymmetric Problems , 2010, SIAM J. Sci. Comput..

[6]  Yousef Saad,et al.  Hybrid Krylov Methods for Nonlinear Systems of Equations , 1990, SIAM J. Sci. Comput..

[7]  Ulrich Rüde,et al.  A Massively Parallel Multigrid Method for Finite Elements , 2006, Computing in Science & Engineering.

[8]  Roland W. Freund,et al.  A Transpose-Free Quasi-Minimal Residual Algorithm for Non-Hermitian Linear Systems , 1993, SIAM J. Sci. Comput..

[9]  Marian Brezina,et al.  Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems , 2005, Computing.

[10]  Courtenay T. Vaughan,et al.  Zoltan data management services for parallel dynamic applications , 2002, Comput. Sci. Eng..

[11]  Paul Lin,et al.  Performance of a Petrov–Galerkin algebraic multilevel preconditioner for finite element modeling of the semiconductor device drift‐diffusion equations , 2010 .

[12]  Xiao-Chuan Cai,et al.  A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems , 1999, SIAM J. Sci. Comput..

[13]  John N. Shadid,et al.  Sparse iterative algorithm software for large-scale MIMD machines: An initial discussion and implementation , 1992, Concurr. Pract. Exp..

[14]  Timothy C. Warburton,et al.  Extreme-Scale AMR , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Thomas A. Manteuffel,et al.  Adaptive Smoothed Aggregation (AlphaSA) Multigrid , 2005, SIAM Rev..

[16]  Paul Lin,et al.  Performance of fully coupled algebraic multilevel domain decomposition preconditioners for incompressible flow and transport , 2006 .

[17]  D. Keyes,et al.  Jacobian-free Newton-Krylov methods: a survey of approaches and applications , 2004 .

[18]  John N. Shadid,et al.  On a multilevel preconditioning module for unstructured mesh Krylov solvers: two-level Schwarz , 2002 .

[19]  Tanja Clees,et al.  AMG Strategies for PDE Systems with Applications in Industrial Semiconductor Simulation , 2005 .

[20]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[21]  Petr Vanek,et al.  Analysis of an algebraic Petrov--Galerkin smoothed aggregation multigrid method , 2008 .

[22]  Vipin Kumar,et al.  Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[23]  Paul T. Lin,et al.  Performance of a parallel algebraic multilevel preconditioner for stabilized finite element semiconductor device modeling , 2009, J. Comput. Phys..

[24]  F. Shakib Finite element analysis of the compressible Euler and Navier-Stokes equations , 1989 .

[25]  George Biros,et al.  A Parallel Geometric Multigrid Method for Finite Elements on Octree Meshes , 2010, SIAM J. Sci. Comput..

[26]  Ray S. Tuminaro,et al.  A New Petrov--Galerkin Smoothed Aggregation Preconditioner for Nonsymmetric Linear Systems , 2008, SIAM J. Sci. Comput..