Evaluating support for global address space languages on the Cray X1

The Cray X1 was recently introduced as the first in a new line of parallel systems to combine high-bandwidth vector processing with an MPP system architecture. Alongside capabilities such as automatic fine-grained data parallelism through the use of vector instructions, the X1 offers hardware support for a transparent global-address space (GAS), which makes it an interesting target for GAS languages. In this paper, we describe our experience with developing a portable, open-source and high performance compiler for Unified Parallel C (UPC), a SPMD global-address space language extension of ISO C. As part of our implementation effort, we evaluate the X1's hardware support for GAS languages and provide empirical performance characterizations in the context of leveraging features such as vectorization and global pointers for the Berkeley UPC compiler. We discuss several difficulties encountered in the Cray C compiler which are likely to present challenges for many users, especially implementors of libraries and source-to-source translators. Finally, we analyze the performance of our compiler on some benchmark programs and show that, while there are some limitations of the current compilation approach, the Berkeley UPC compiler uses the X1 network more effectively than MPI or SHMEM, and generates serial code whose vectorizability is comparable to the original C code.

[1]  David E. Culler,et al.  Managing concurrent access for shared memory active messages , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[2]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[3]  Chris J. Scheiman,et al.  LogGP: Incorporating Long Messages into the LogP Model for Parallel Computation , 1997, J. Parallel Distributed Comput..

[4]  Tarek A. El-Ghazawi,et al.  UPC Performance and Potential: A NPB Experimental Study , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[5]  Dan Bonachea GASNet Specification, v1.1 , 2002 .

[6]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[7]  P.H. Worley,et al.  Early Evaluation of the Cray X1 , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[8]  Steven L. Scott,et al.  Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.

[9]  Michael Wolfe,et al.  Eeectiveness of Message Strip-mining for Regular and Irregular Communication , 1994 .

[10]  J HendrenLaurie,et al.  Communication optimizations for parallel C programs , 1998 .

[11]  Katherine A. Yelick,et al.  A performance analysis of the Berkeley UPC compiler , 2003, ICS '03.

[12]  John M. Mellor-Crummey,et al.  Co-array Fortran Performance and Potential: An NPB Experimental Study , 2003, LCPC.

[13]  William Pugh,et al.  On Parallel Hashing and Integer Sorting (Extended Summary) , 1990, ICALP.

[14]  Katherine Yelick,et al.  A proposal for a UPC memory consistency model, v1.0 , 2004 .

[15]  Jong-Deok Choi,et al.  Global communication analysis and optimization , 1996, PLDI '96.

[16]  Laurie J. Hendren,et al.  Communication optimizations for parallel C programs , 1998, J. Parallel Distributed Comput..

[17]  Katherine Yelick,et al.  Titanium Language Reference Manual , 2001 .

[18]  J. Shalf,et al.  Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[19]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[20]  F. H. Mcmahon,et al.  The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .

[21]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[22]  Paul D. Gader,et al.  Image algebra techniques for parallel image processing , 1987 .

[23]  Katherine A. Yelick,et al.  Analyses and Optimizations for Shared Address Space Programs , 1996, J. Parallel Distributed Comput..

[24]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..