Runtime Performance of Parallel Array Assignment: An Empirical Study

Generating code for the array assignment statement of High Performance Fortran (HPF) in the presence of block-cyclic distributions of data arrays is considered difficult, and several algorithms have been published to solve this problem. We present a comprehensive study of the run-time performance of the code these algorithms generate. We classify these algorithms into several families, identify several issues of interest in the generated code, and present experimental performance data for the various algorithms. We demonstrate that the code generated for block-cyclic distributions runs almost as efficiently as that generated for block or cyclic distributions.

[1]  Ken Kennedy,et al.  Efficient address generation for block-cyclic distributions , 1995, ICS '95.

[2]  J. Ramanujam,et al.  Fast Address Sequence Generation for Data-Parallel Programs Using Integer Lattices , 1995, LCPC.

[3]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[4]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[5]  Jack J. Dongarra,et al.  Software Libraries for Linear Algebra Computations on High Performance Computers , 1995, SIAM Rev..

[6]  Ken Kennedy,et al.  Communication Generation for Cyclic(K) Distributions , 1996 .

[7]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[8]  Samuel P. Midkiff Local Iteration Set Computation for Block-Cyclic Distributions , 1995, ICPP.

[9]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[10]  John R. Gilbert,et al.  Generating local addresses and communication sets for data-parallel programs , 1993, PPOPP '93.

[11]  Charles Koelbel Compile-time generation of regular communications patterns , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[12]  Ken Kennedy,et al.  A linear-time algorithm for computing the memory access sequence in data-parallel programs , 1995, PPOPP '95.

[13]  James M. Stichnoth Efficient Compilation of Array Statements for Private Memory Multicomputers , 1993 .