Communication-Free Parallelization via Affine Transformations

The paper describes a parallelization algorithm for programs consisting of arbitrary nestings of loops and sequences of loops. The code produced by our algorithm yields all the degrees of communication-free parallelism that can be obtained via loop fission, fusion, interchange, reversal, skewing, scaling, reindexing and statement reordering. The algorithm first assigns the iterations of instructions in the program to processors via affine processor mappings, then generates the correct code by ensuring that the code executed by each processor is a subsequence of the original sequential execution sequence.

[1]  Ken Kennedy,et al.  Automatic decomposition of scientific programs for parallel execution , 1987, POPL '87.

[2]  Jordi Torres,et al.  Partitioning the statement per iteration space using non-singular matrices , 1993, ICS '93.

[3]  Ken Kennedy,et al.  Optimizing for parallelism and data locality , 1992, ICS '92.

[4]  FeautrierLaboratoire Masi Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .

[5]  TimePaul FeautrierLaboratoire Masi Some Eecient Solutions to the Aane Scheduling Problem Part I One-dimensional Time , 1993 .

[6]  Vivek Sarkar,et al.  A general framework for iteration-reordering loop transformations , 1992, PLDI '92.

[7]  W. Pugh,et al.  A framework for unifying reordering transformations , 1993 .

[8]  P. Feautrier Some Eecient Solutions to the Aane Scheduling Problem Part I One-dimensional Time , 1996 .

[9]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[10]  P. Sadayappan,et al.  Communication-Free Hyperplane Partitioning of Nested Loops , 1993, J. Parallel Distributed Comput..

[11]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[12]  Jordi Torres,et al.  Align and Distribute-based Linear Loop Transformations , 1993, LCPC.

[13]  Utpal Banerjee,et al.  Speedup of ordinary programs , 1979 .

[14]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[15]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[16]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[17]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[18]  Monica S. Lam,et al.  Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[19]  M. Wolfe,et al.  Massive parallelism through program restructuring , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[20]  Ken Kennedy,et al.  Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.