Evaluation of PGAS Communication Paradigms with Geometric Multigrid

Partitioned Global Address Space (PGAS) languages and one-sided communication enable application developers to select the communication paradigm that balances the performance needs of applications with the productivity desires of programmers. In this paper, we evaluate three different one-sided communication paradigms in the context of geometric multigrid using the miniGMG benchmark. Although miniGMG's static, regular, and predictable communication does not exploit the ultimate potential of PGAS models, multigrid solvers appear in many contemporary applications and represent one of the most important communication patterns. We use UPC++, a PGAS extension of C++, as the vehicle for our evaluation, though our work is applicable to any of the existing PGAS languages and models. We compare performance with the highly tuned MPI baseline, and the results indicate that the most promising approach towards achieving performance and ease of programming is to use high-level abstractions, such as the multidimensional arrays provided by UPC++, that hide data aggregation and messaging in the runtime library.

[1]  Michael Garland,et al.  Designing a unified programming model for heterogeneous machines , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[3]  Daniel Grünewald BQCD with GPI: A case study , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).

[4]  Dan Bonachea GASNet Specification, v1.1 , 2002 .

[5]  Samuel Williams,et al.  Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Rui Machado,et al.  Unbalanced tree search on a manycore system using the GPI programming model , 2011, Computer Science - Research and Development.

[7]  Marc Snir,et al.  Optimizing the Barnes-Hut algorithm in UPC , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[8]  Katherine A. Yelick,et al.  A Local-View Array Library for Partitioned Global Address Space C++ Programs , 2014, ARRAY@PLDI.

[9]  Torsten Hoefler,et al.  Enabling highly-scalable remote memory access programming with MPI-3 one sided , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[10]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[11]  William N. Scherer,et al.  A new vision for coarray Fortran , 2009, PGAS '09.

[12]  Leonid Oliker,et al.  Implementation and Optimization of miniGMG - a Compact Geometric Multigrid Benchmark , 2012 .

[13]  Tarek A. El-Ghazawi,et al.  UPC Performance and Potential: A NPB Experimental Study , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[14]  Phillip Colella,et al.  Adaptive mesh refinement in Titanium , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[15]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[16]  Barbara M. Chapman,et al.  Performance Analysis of the NWChem TCE for Different Communication Patterns , 2013, PMBS@SC.

[17]  Dan Bonachea Proposal for extending the upc memory copy library functions and supporting extensions to gasnet , 2004 .

[18]  Nicholas J. Wright,et al.  Accelerating Applications at Scale Using One-Sided Communication , 2012 .

[19]  Daniel Etiemble,et al.  Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language , 2014, International Journal of Parallel Programming.

[20]  Katherine A. Yelick,et al.  UPC++: A PGAS Extension for C++ , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[21]  Steven J. Deitz,et al.  The High-Level Parallel Language ZPL Improves Productivity and Performance , 2004 .

[22]  Katherine A. Yelick,et al.  Titanium Performance and Potential: An NPB Experimental Study , 2005, LCPC.

[23]  Jens Jägersküpper,et al.  A PGAS-based Implementation for the Unstructured CFD Solver TAU , 2011 .