Algorithms for Automatic Alignment of Arrays

Aggregate data objects (such as arrays) are distributed across the processor memories when compiling a data-parallel language for a distributed-memory machine. The mapping determines the amount of communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: analignmentthat maps all the objects to an abstract template, followed by adistributionthat maps the template to the processors. This paper describes algorithms for solving the various facets of the alignment problem: axis and stride alignment, static and mobile offset alignment, and replication labeling. We show that optimal axis and stride alignment is NP-complete for general program graphs and give a heuristic method that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. We also show how local graph contractions can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. We show how to model the static offset alignment problem using linear programming, and we show that loop-dependent mobile offset alignment is sometimes necessary for optimum performance. We describe an algorithm with for determining mobile alignments for objects withindoloops. We also identify situations in which replicated alignment is either required by the program itself or can be used to improve performance. We describe an algorithm based on network flow that replicates objects so as to minimize the total amount of broadcast communication in replication.

[1]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[2]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[3]  Manish Gupta,et al.  Automatic Data Partitioning on Distributed Memory Multicomputers , 1992 .

[4]  P. Sadayappan,et al.  Communication-Free Hyperplane Partitioning of Nested Loops , 1993, J. Parallel Distributed Comput..

[5]  Ken Kennedy,et al.  Fortran D Language Specification , 1990 .

[6]  John R. Gilbert,et al.  The Alignment-Distribution Graph , 1993, LCPC.

[7]  John Cocke,et al.  A methodology for the real world , 1981 .

[8]  John R. Gilbert,et al.  Modeling Data-Parallel Programs with the Alignment-Distribution Graph , 1994 .

[9]  Corporate Rice University,et al.  High performance Fortran language specification , 1993, FORF.

[10]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[11]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[12]  John R. Gilbert,et al.  Optimal Expression Evaluation for Data Parallel Architectures , 1991, J. Parallel Distributed Comput..

[13]  John R. Gilbert,et al.  Optimal evaluation of array expressions on massively parallel machines , 1995, TOPL.

[14]  Keshav Pingali,et al.  Solving Alignment Using Elementary Linear Algebra , 1994, LCPC.

[15]  David R. Karger,et al.  An Õ(n2) algorithm for minimum cuts , 1993, STOC.

[16]  Marina C. Chen,et al.  The Data Alignment Phase in Compiling Programs for Distrubuted-Memory Machines , 1991, J. Parallel Distributed Comput..

[17]  Guy L. Steele,et al.  Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines , 1990, J. Parallel Distributed Comput..

[18]  John R. Gilbert,et al.  Automatic array alignment in data-parallel programs , 1993, POPL '93.

[19]  J. R. Gilbert,et al.  Mobile and replicated alignment of arrays in data-parallel programs , 1993, Supercomputing '93. Proceedings.

[20]  Piyush Mehrotra,et al.  Vienna Fortran—a Fortran language extension for distributed memory multiprocessors , 1992 .

[21]  John R. Gilbert,et al.  Generating Local Address and Communication Sets for Data-Parallel Programs , 1995, J. Parallel Distributed Comput..

[22]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[23]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[24]  John R. Gilbert,et al.  Array Distribution in Data-Parallel Programs , 1994, LCPC.

[25]  John Cocke,et al.  Register Allocation Via Coloring , 1981, Comput. Lang..

[26]  Skef Wholey Automatic data mapping for distributed-memory parallel computers , 1992, ICS '92.