On Loop Transformations for Generalized Cycle Shrinking

This paper describes several loop transformation techniques for extracting parallelism from nested loop structures. Nested loops can then be scheduled to run in parallel so that execution time is minimized. One technique is called selective cycle shrinking, and the other is called true dependence cycle shrinking. It is shown how selective shrinking is related to linear scheduling of nested loops and how true dependence shrinking is related to conflict-free mappings of higher dimensional algorithms into lower dimensional processor arrays. Methods are proposed in this paper to find the selective and true dependence shrinkings with minimum total execution time by applying the techniques of finding optimal linear schedules and optimal and conflict-free mappings proposed by W. Shang and A.B. Fortes. >

[1]  Weijia Shang,et al.  Independent Partitioning of Algorithms with Uniform Dependencies , 1992, IEEE Trans. Computers.

[2]  Constantine D. Polychronopoulos,et al.  Parallel programming and compilers , 1988 .

[3]  W. Shang,et al.  On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays , 1992, IEEE Trans. Parallel Distributed Syst..

[4]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[5]  Mokhtar S. Bazaraa,et al.  Nonlinear Programming: Theory and Algorithms , 1993 .

[6]  Weijia Shang,et al.  On the optimality of linear schedules , 1989, J. VLSI Signal Process..

[7]  Benjamin W. Wah,et al.  Guest Editors' Introduction: Systolic Arrays-From Concept to Implementation , 1987, Computer.

[8]  Ron Cytron Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[9]  Constantine D. Polychronopoulos Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design , 1988, IEEE Trans. Computers.

[10]  Richard M. Karp,et al.  The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.

[11]  Henry G. Dietz,et al.  Hardware Barrier Synchronization: Static Barrier MIMD (SBM) , 1990, ICPP.

[12]  Weijia Shang,et al.  Time Optimal Linear Schedules for Algorithms with Uniform Dependencies , 1991, IEEE Trans. Computers.

[13]  Henry G. Dietz,et al.  Loop Coalescing and Scheduling for Barrier MIMD Architectures , 1993, IEEE Trans. Parallel Distributed Syst..

[14]  Michael Wolfe,et al.  Multiple Version Loops , 1987, ICPP.

[15]  L. Mordell,et al.  Diophantine equations , 1969 .

[16]  Jih-Kwon Peir,et al.  Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors , 1989, IEEE Trans. Computers.