Evaluation of the partitioned global address space (PGAS) model for an inviscid Euler solver

In this paper we evaluate the performance of Unified Parallel C (which implements the partitioned global address space programming model) using a numerical method that is widely used in fluid dynamics. In order to evaluate the incremental approach to parallelization (which is possible with UPC) and its performance characteristics, we implement different levels of optimization of the UPC code and compare it with an MPI parallelization on four different clusters of the Austrian HPC infrastructure (LEO3, LEO3E, VSC2, VSC3) and on an Intel Xeon Phi. We find that UPC is significantly easier to develop in compared to MPI and that the performance achieved is comparable to MPI in most situations. The obtained results show worse performance (on VSC2), competitive performance (on LEO3, LEO3E and VSC3), and superior performance (on the Intel Xeon Phi).

[1]  Jens Jägersküpper,et al.  A PGAS-based Implementation for the Unstructured CFD Solver TAU , 2011 .

[2]  Stephen A. Jarvis,et al.  CloverLeaf: Preparing Hydrodynamics Codes for Exascale , 2013 .

[3]  E. Toro Riemann Solvers and Numerical Methods for Fluid Dynamics , 1997 .

[4]  Dhabaleswar K. Panda,et al.  UPC on MIC: Early Experiences with Native and Symmetric Modes , 2013 .

[5]  Dhabaleswar K. Panda,et al.  High performance OpenSHMEM for Xeon Phi clusters: Extensions, runtime designs and application co-design , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[6]  Jason Duell,et al.  Productivity and performance using partitioned global address space languages , 2007, PASCO '07.

[7]  Juan Touriño,et al.  Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures , 2009, PVM/MPI.

[8]  R. LeVeque Finite Volume Methods for Hyperbolic Problems: Characteristics and Riemann Problems for Linear Hyperbolic Equations , 2002 .

[9]  Zhang Zhang,et al.  Benchmark measurements of current UPC platforms , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[10]  Mohamed M. Zahran,et al.  Productivity analysis of the UPC language , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[11]  Stephen A. Jarvis,et al.  Experiences at scale with PGAS versions of a Hydrodynamics application , 2014, PGAS.

[12]  Haoqiang Jin,et al.  Analyzing the Effect of Different Programming Models Upon Performance and Memory Usage on Cray XT5 Platforms , 2010 .

[13]  Niclas Jansson,et al.  Improving Parallel Performance of FEniCS Finite Element Computations by Hybrid MPI/PGAS , 2013 .

[14]  Robert Hood,et al.  A practical study of UPC using the NAS Parallel Benchmarks , 2009, PGAS '09.

[15]  R. LeVeque Numerical methods for conservation laws , 1990 .

[16]  Amir Kamil,et al.  Analysis of Partitioned Global Address Space Programs , 2006 .