Mobile and replicated alignment of arrays in data-parallel programs

When a data-parallel language like Fortran 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. The authors solve two facets of the problem of finding alignments that reduce residual communication, i.e., determining both the alignments that vary in loops, and the objects that should have replicated alignments. They show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and they provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. They also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. An algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication is proposed.

[1]  Ken Kennedy,et al.  Fortran D Language Specification , 1990 .

[2]  John R. Gilbert,et al.  The Alignment-Distribution Graph , 1993, LCPC.

[3]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[4]  Guy L. Steele,et al.  Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines , 1990, J. Parallel Distributed Comput..

[5]  Skef Wholey Automatic data mapping for distributed-memory parallel computers , 1992, ICS '92.

[6]  John R. Gilbert,et al.  Automatic array alignment in data-parallel programs , 1993, POPL '93.

[7]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[8]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[9]  Manish Gupta,et al.  Automatic Data Partitioning on Distributed Memory Multicomputers , 1992 .

[10]  Marina C. Chen,et al.  The Data Alignment Phase in Compiling Programs for Distrubuted-Memory Machines , 1991, J. Parallel Distributed Comput..

[11]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[12]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[13]  John R. Gilbert,et al.  Optimal Expression Evaluation for Data Parallel Architectures , 1991, J. Parallel Distributed Comput..

[14]  John R. Gilbert,et al.  Optimal evaluation of array expressions on massively parallel machines , 1995, TOPL.