Author retrospective: compilation techniques for block-cyclic distributions

Compilers for data-parallel languages use data distribution specifications to guide code generation for distributed-memory machines. Our 1994 paper described how to generate efficient code for programs that employ block-cyclic data distributions. In subsequent work at Rice University, Darte, Mellor-Crummey, Fowler, and Chavarría-Miranda developed a more general form of block-cyclic partitionings known as generalized multipartitionings. Generalized multipartitionings provide some additional balance properties that make them useful for parallelizing computations that solve recurrences along spatial dimensions. These partitionings were subsequently implemented in Rice University's dHPF compiler for High Performance Fortran. In the years since, the field has changed in many ways; however, variants of block-cyclic data distributions are still used today by modern parallel programming models, algorithms, and compilers. Original paper: http://dx.doi.org/10.1145/181181.181572

[1]  Robert J. Fowler,et al.  Generalized multipartitioning for multi-dimensional arrays , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[2]  Robert J. Fowler,et al.  Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations , 2003, J. Parallel Distributed Comput..

[3]  Ken Kennedy,et al.  A linear-time algorithm for computing the memory access sequence in data-parallel programs , 1995, PPOPP '95.

[4]  James Demmel,et al.  Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.

[5]  Ken Kennedy,et al.  Communication Generation for Cyclic(K) Distributions , 1996 .

[6]  Vijay K. Naik,et al.  Parallelization of a Class of Implicit Finite Difference Schemes in Computational Fluid Dynamics , 1993, Int. J. High Speed Comput..

[7]  S. Lennart Johnsson,et al.  Alternating direction methods on multiprocessors , 1987 .

[8]  Katherine Yelick,et al.  UPC: Distributed Shared-Memory Programming , 2003 .

[9]  Ken Kennedy,et al.  Efficient address generation for block-cyclic distributions , 1995, ICS '95.

[10]  P. R. Cappello,et al.  Implementing the beam and warming method on the hypercube , 1989, C3P.

[11]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[12]  Peter R. Cappello,et al.  Implementing the 3D Alternating Direction Method on the Hypercube , 1994, J. Parallel Distributed Comput..