Optimization of serial and parallel communications for parallel geometric multigrid method

The parallel multigrid method is expected to play an important role in large-scale scientific computing on post-peta/exa-scale supercomputer systems, and it also includes serial and parallel communication processes which are generally expensive. In the present work, new format for sparse matrix storage based on sliced Ellpack-Itpack (ELL) format is proposed for optimization of serial communication in data transfer through memories, and hierarchical coarse grid aggregation (hCGA) is introduced for optimization of parallel communication by message passing. The proposed methods are implemented for pGW3D-FVM, a parallel code for 3D groundwater flow simulations using the multigrid method, and the robustness and performance of the code was evaluated on up to 4,096 nodes (65,536 cores) of the Fujistu FX10 supercomputer system at the University of Tokyo. The parallel multigrid solver using the sliced ELL format provided performance improvement in both weak scaling (25%-31%) and strong scaling (9%-22%) compared to the code using the original ELL format. Moreover, hCGA provided excellent performance improvement in both weak scaling (1.61 times) and strong scaling (6.27 times) for flat MPI parallel programming model.

[1]  Paul Lin,et al.  Improving multigrid performance for unstructured mesh drift–diffusion simulations on 147,000 cores , 2012 .

[2]  Albuquerque,et al.  Improving multigrid performance for unstructured mesh drift–diffusion simulations on 147,000 cores , 2012 .

[3]  Kengo Nakajima Large-scale Simulations of 3D Groundwater Flow using Parallel Geometric Multigrid Method , 2013 .

[4]  Kengo Nakajima OpenMP/MPI Hybrid Parallel Multigrid Method on Fujitsu FX10 Supercomputer System , 2012, 2012 IEEE International Conference on Cluster Computing Workshops.

[5]  Arutyun Avetisyan,et al.  Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.

[6]  Hari Sundar,et al.  Parallel geometric-algebraic multigrid on unstructured forests of octrees , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Wim Vanroose,et al.  Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..

[8]  Martin Schulz,et al.  Challenges of Scaling Algebraic Multigrid Across Modern Multicore Architectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[9]  Clayton V. Deutsch,et al.  GSLIB: Geostatistical Software Library and User's Guide , 1993 .

[10]  Barry F. Smith,et al.  Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .

[11]  Kengo Nakajima Automatic Tuning of Parallel Multigrid Solvers Using OpenMP/MPI Hybrid Parallel Programming Models , 2012, VECPAR.

[12]  Kengo Nakajima New strategy for coarse grid solvers in parallel multigrid methods using OpenMP/MPI hybrid programming models , 2012, PMAM '12.