论文信息 - Complexity of Multi-dimensional Loop Alignment

Complexity of Multi-dimensional Loop Alignment

Loop alignment is a classical program transformation that can enable the fusion of parallel loops, thereby increasing locality and reducing the number of synchronizations. Although the problem is quite old in the one-dimensional case (i.e., no nested loops), it came back recently - with a multi-dimensional form - when trying to refine parallelization algorithms based on multi-dimensional schedules. The main result of this paper is that, unlike the problem in 1D, finding a multi-dimensional shift of statements that makes an innermost loop parallel is strongly NP-complete. Nevertheless, we identify some polynomially-solvable cases that can occur in practice and we show that the general problemcan be stated as a systemof integer linear constraints.

Alain Darte | Guillaume Huard

[1] Frédéric Vivien,et al. Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling , 1997, Parallel Process. Lett..

[2] Ken Kennedy,et al. Loop distribution with arbitrary control flow , 1990, Proceedings SUPERCOMPUTING '90.

[3] W. Kelly,et al. Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[4] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[5] Ken Kennedy,et al. Automatic decomposition of scientific programs for parallel execution , 1987, POPL '87.

[6] Leslie Lamport,et al. The parallel execution of DO loops , 1974, CACM.

[7] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[8] J K Peir. Program partitioning and synchronization on multiprocessor systems , 1986 .

[9] Barbara M. Chapman,et al. Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[10] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .

[11] Yves Robert,et al. Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[12] Pierre Boulet,et al. Loop Parallelization Algorithms: From Parallelism Extraction to Code Generation , 1998, Parallel Comput..

[13] William Pugh,et al. Selecting Affine Mappings Based on Performance Estimation , 1994, Parallel Process. Lett..

[14] Jih-Kwon Peir,et al. Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors , 1989, IEEE Trans. Computers.

[15] FeautrierPaul. Some efficient solutions to the affine scheduling problem , 1992 .

[16] Sanjay V. Rajopadhye,et al. Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[17] Alain Darte,et al. Loop Shifting for Loop Compaction , 1999, LCPC.

[18] Alain Darte,et al. On the complexity of loop fusion , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[19] Kunio Okuda,et al. Cycle Shrinking by Dependence Reduction , 1996, Euro-Par, Vol. I.

[20] Monica S. Lam,et al. Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[21] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[22] Scott A. Mahlke,et al. High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[23] Paul Feautrier,et al. Construction of Do Loops from Systems of Affine Constraints , 1995, Parallel Process. Lett..

[24] Edwin Hsing-Mean Sha,et al. Polynomial-time nested loop fusion with full parallelism , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.