The design and implementation of a parallel array operator for the arbitrary remapping of data

Gather and scatter are data redistribution functions of long-standing importance to high performance computing. In this paper, we present a highly-general array operator with powerful gather and scatter capabilities unmatched by other array languages. We discuss an efficient parallel implementation, introducing three new optimizations---schedule compression, dead array reuse, and direct communication---that reduce the costs associated with the operator's wide applicability. In our implementation of this operator in ZPL, we demonstrate performance comparable to the hand-coded Fortran + MPI versions of the NAS FT and CG benchmarks.

[1]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[2]  Steven J. Deitz,et al.  High-level Language Support for User-defined Reductions , 2004, The Journal of Supercomputing.

[3]  Bradford L. Chamberlain The design and implementation of a region-based parallel language , 2001 .

[4]  E. Blum,et al.  A programming language , 1899, AIEE-IRE '62 (Spring).

[5]  Kenneth E. Iverson,et al.  A programming language , 1899, AIEE-IRE '62 (Spring).

[6]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[7]  Bradford L. Chamberlain,et al.  The design and implementation of a region-based parallel programming language , 2001 .

[8]  Siegfried Benkner,et al.  Compiling High Performance Fortran for distributed-memory architectures , 1999, Parallel Comput..

[9]  Piyush Mehrotra,et al.  High-level management of communication schedules in HPF-like languages , 1998, ICS '98.

[10]  Wilhelm Gehrke Fortran 95 Language Guide , 1996, Springer London.

[11]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[12]  Joel H. Saltz,et al.  Principles of runtime support for parallel processors , 1988, ICS '88.

[13]  Bradford L. Chamberlain,et al.  ZPL's WYSIWYG performance model , 1998, Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments.

[14]  Lawrence Snyder,et al.  A programmer's guide to ZPL , 1999 .

[15]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[16]  Joel H. Saltz,et al.  Run-Time Techniques for Parallelizing Sparse Matrix Problems , 1995, IRREGULAR.

[17]  Bradford L. Chamberlain,et al.  Array language support for parallel sparse computation , 2001, ICS '01.

[18]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[19]  Edith Schonberg,et al.  An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[20]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[21]  Bradford L. Chamberlain,et al.  Problem space promotion and its evaluation as a technique for efficient parallel computation , 1999, ICS '99.