Complexity of Multi-dimensional Loop Alignment

Loop alignment is a classical program transformation that can enable the fusion of parallel loops, thereby increasing locality and reducing the number of synchronizations. Although the problem is quite old in the one-dimensional case (i.e., no nested loops), it came back recently - with a multi-dimensional form - when trying to refine parallelization algorithms based on multi-dimensional schedules. The main result of this paper is that, unlike the problem in 1D, finding a multi-dimensional shift of statements that makes an innermost loop parallel is strongly NP-complete. Nevertheless, we identify some polynomially-solvable cases that can occur in practice and we show that the general problemcan be stated as a systemof integer linear constraints.

[1]  Frédéric Vivien,et al.  Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling , 1997, Parallel Process. Lett..

[2]  Ken Kennedy,et al.  Loop distribution with arbitrary control flow , 1990, Proceedings SUPERCOMPUTING '90.

[3]  W. Kelly,et al.  Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[4]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[5]  Ken Kennedy,et al.  Automatic decomposition of scientific programs for parallel execution , 1987, POPL '87.

[6]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[7]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[8]  J K Peir Program partitioning and synchronization on multiprocessor systems , 1986 .

[9]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[10]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[11]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[12]  Pierre Boulet,et al.  Loop Parallelization Algorithms: From Parallelism Extraction to Code Generation , 1998, Parallel Comput..

[13]  William Pugh,et al.  Selecting Affine Mappings Based on Performance Estimation , 1994, Parallel Process. Lett..

[14]  Jih-Kwon Peir,et al.  Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors , 1989, IEEE Trans. Computers.

[15]  FeautrierPaul Some efficient solutions to the affine scheduling problem , 1992 .

[16]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[17]  Alain Darte,et al.  Loop Shifting for Loop Compaction , 1999, LCPC.

[18]  Alain Darte,et al.  On the complexity of loop fusion , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[19]  Kunio Okuda,et al.  Cycle Shrinking by Dependence Reduction , 1996, Euro-Par, Vol. I.

[20]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[21]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[22]  Scott A. Mahlke,et al.  High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[23]  Paul Feautrier,et al.  Construction of Do Loops from Systems of Affine Constraints , 1995, Parallel Process. Lett..

[24]  Edwin Hsing-Mean Sha,et al.  Polynomial-time nested loop fusion with full parallelism , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.