An evaluation of global address space languages: co-array fortran and unified parallel C

Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for inter-process communication instead of message passing. However, the performance of these emerging languages still has room for improvement. In this paper, we study the performance of variants of the NAS MG, CG, SP, and BT benchmarks on several modern architectures to identify challenges that must be met to deliver top performance. We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication. However, our experiments uncovered some significant performance bottlenecks of UPC codes on all platforms. We account for the root causes limiting UPC performance such as the synchronization model, the communication efficiency of strided data, and source-to-source translation issues. We show that they can be remedied with language extensions, new synchronization constructs, and, finally, adequate optimizations by the back-end C compilers.

[1]  Vijay K. Naik,et al.  A Scalable Implementation of the NAS Parallel Benchmark BT on Distributed Memory Systems , 1995, IBM Syst. J..

[2]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[3]  Bryan Carpenter,et al.  ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.

[4]  Dan Bonachea GASNet Specification, v1.1 , 2002 .

[5]  Tarek A. El-Ghazawi,et al.  UPC Performance and Potential: A NPB Experimental Study , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[6]  John M. Mellor-Crummey,et al.  Co-array Fortran Performance and Potential: An NPB Experimental Study , 2003, LCPC.

[7]  R. Thakur,et al.  UPC-IO: A Parallel I/O API for UPC , 2003 .

[8]  Katherine A. Yelick,et al.  A performance analysis of the Berkeley UPC compiler , 2003, ICS '03.

[9]  Tarek A. El-Ghazawi,et al.  Performance monitoring and evaluation of a UPC implementation on a NUMA architecture , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[10]  Dan Bonachea Proposal for extending the upc memory copy library functions and supporting extensions to gasnet , 2004 .

[11]  J. Mellor-Crummey,et al.  A multi-platform co-array Fortran compiler , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[12]  Robert J. Fowler,et al.  HPCVIEW: A Tool for Top-down Analysis of Node Performance , 2002, The Journal of Supercomputing.

[13]  John M. Mellor-Crummey,et al.  Experiences with Co-array Fortran on Hardware Shared Memory Platforms , 2004, LCPC.