Generating local addresses and communication sets for data-parallel programs

Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We show that for an array <italic>A</italic> affinely aligned to a <italic>template</italic> that is distributed across <italic>p</italic> processors with a <italic>cyclic(k)</italic> distribution, and a computation involving the regular section <italic>A(l:h:s)</italic>, the local memory access sequence for any processor is characterized by a finite state machine of at most <italic>k</italic> states. We present fast algorithms for computing the essential information about these state machines, and extend the framework to handle multidimensional arrays. We also show how to generate communication sets using the state machine approach. Performance results show that this solution requires very little runtime overhead and acceptable preprocessing time.

[1]  Jack J. Dongarra,et al.  Performance of various computers using standard linear equations software in a FORTRAN environment , 1988, CARN.

[2]  Ken Kennedy,et al.  Fortran D Language Specification , 1990 .

[3]  J. Dongarra Performance of various computers using standard linear equations software , 1990, CARN.

[4]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[5]  Joel H. Saltz,et al.  Principles of runtime support for parallel processors , 1988, ICS '88.

[6]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[7]  E. T. An Introduction to the Theory of Numbers , 1946, Nature.

[8]  Charles Koelbel Compile-time generation of regular communications patterns , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[9]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[10]  David H. Bailey Unfavorable strides in cache memory systems , 1992 .

[11]  David H. Bailey Unfavorable Strides in Cache Memory Systems (RNR Technical Report RNR-92-015) , 1995, Sci. Program..

[12]  James M. Stichnoth Efficient Compilation of Array Statements for Private Memory Multicomputers , 1993 .

[13]  R. van de Geijn,et al.  A look at scalable dense linear algebra libraries , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[14]  E. Wright,et al.  An Introduction to the Theory of Numbers , 1939 .