Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors

Arrays are mapped to processors through a two-step process?alignment followed by distribution?in data-parallel languages such as High Performance Fortran. This process of mapping creates disjoint pieces of the array that are locally owned by each processor. An HPF compiler that generates code for array statements must compute the sequence of local memory addresses accessed by each processor and the sequence of sends and receives for a given processor to access nonlocal data. In this paper, we present an approach to the address sequence generation problem using the theory of integer lattices. The set of elements referenced can be generated by integer linear combinations of basis vectors. Unlike other work on this problem, we derive closed form expressions for the basis vectors as a function of the mapping of data. Using these basis vectors and exploiting the fact that there is a repeating pattern in the access sequence, we derive highly optimized code that generates the pattern at runtime. The code generated uses table-lookup of the pattern. Experimental results show that our approach is faster than other solutions to this problem.

[1]  J. Ramanujam,et al.  Communication Generation and Optimization for HPF , 1996 .

[2]  J. Ramanujam,et al.  Generalized Overlap Regions for Communication Optimization in Data-Parallel Programs , 1996, LCPC.

[3]  Charles Koelbel Compile-time generation of regular communications patterns , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[4]  David R. O'Hallaron,et al.  Languages, Compilers and Run-Time Systems for Scalable Computers , 1998, Springer US.

[5]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[6]  Samuel P. Midkiff Local Iteration Set Computation for Block-Cyclic Distributions , 1995, ICPP.

[7]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[8]  George B. Dantzig,et al.  Fourier-Motzkin Elimination and Its Dual , 1973, J. Comb. Theory A.

[9]  Ken Kennedy,et al.  Compilation techniques for block-cyclic distributions , 1994 .

[10]  J. Ramanujam,et al.  Non-unimodular transformations of nested loops , 1992, Proceedings Supercomputing '92.

[11]  J. Ramanujam,et al.  Fast Address Sequence Generation for Data-Parallel Programs Using Integer Lattices , 1995, LCPC.

[12]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[13]  Siegfried Benkner Handling block-cyclic distributed arrays in Vienna Fortran 90 , 1995, PACT.

[14]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[15]  Ken Kennedy,et al.  A linear-time algorithm for computing the memory access sequence in data-parallel programs , 1995, PPOPP '95.

[16]  James M. Stichnoth Efficient Compilation of Array Statements for Private Memory Multicomputers , 1993 .

[17]  H. Wijshoff Data organization in parallel computers , 1987 .

[18]  Thomas R. Gross,et al.  Generating Communication for Array Statement: Design, Implementation, and Evaluation , 1994, J. Parallel Distributed Comput..

[19]  John R. Gilbert,et al.  Generating local addresses and communication sets for data-parallel programs , 1993, PPOPP '93.

[20]  Barbara M. Chapman,et al.  Programming in Vienna Fortran , 1992, Sci. Program..

[21]  F. Thorne,et al.  Geometry of Numbers , 2017, Algebraic Number Theory.

[22]  Ken Kennedy,et al.  Fortran D Language Specification , 1990 .