Implementing High-Performance Geometric Multigrid Solver with Naturally Grained Messages

Structured grid linear solvers often require manually packing and unpacking of communication data to achieve high performance.Orchestrating this process efficiently is challenging, labor-intensive, and potentially error-prone.In this paper, we explore an alternative approach that communicates the data with naturally grained message sizes without manual packing and unpacking. This approach is the distributed analogue of shared-memory programming, taking advantage of the global address space in PGAS languages to provide substantial programming ease. However, its performance may suffer from the large number of small messages. We investigate the runtime support required in the UPC++ library for this naturally grained version to close the performance gap between the two approaches and attain comparable performance at scale using the High-Performance Geometric Multgrid (HPGMG-FV) benchmark as a driver.

[1]  Daniel Grünewald BQCD with GPI: A case study , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).

[2]  Rui Machado,et al.  Unbalanced tree search on a manycore system using the GPI programming model , 2011, Computer Science - Research and Development.

[3]  Samuel Williams,et al.  Evaluation of PGAS Communication Paradigms with Geometric Multigrid , 2014, PGAS.

[4]  Dan Bonachea Proposal for extending the upc memory copy library functions and supporting extensions to gasnet , 2004 .

[5]  Katherine A. Yelick,et al.  Communication optimizations for fine-grained UPC applications , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[6]  Jens Jägersküpper,et al.  A PGAS-based Implementation for the Unstructured CFD Solver TAU , 2011 .

[7]  Michael Garland,et al.  Designing a unified programming model for heterogeneous machines , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Torsten Hoefler,et al.  AM++: A generalized active message framework , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[10]  Torsten Hoefler,et al.  Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[11]  Daniel Grunewald BQCD with GPI: A case study , 2012, HPCS 2012.

[12]  Leonid Oliker,et al.  Implementation and Optimization of miniGMG - a Compact Geometric Multigrid Benchmark , 2012 .

[13]  Katherine A. Yelick,et al.  A Local-View Array Library for Partitioned Global Address Space C++ Programs , 2014, ARRAY@PLDI.

[14]  Katherine A. Yelick,et al.  UPC++: A PGAS Extension for C++ , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.