Full Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming

Most scientific and DSP applications are recursive or iterative. Uniform nested loops can be modeled as multi-dimensional data flow graphs (DFGs). To achieve full parallelism of the loop body, i.e., all the computational nodes executed in parallel, substantially decreases the overall computation time. It is well known that for one-dimensional DFGs retiming can not always achieve full parallelism. This paper shows an important and counter-intuitive result, which proves that we can always obtain full-parallelism for DFGs with more than one dimension. It also presents two novel multi-dimensional retiming techniques to obtain full parallelism.

[1]  A. Aiken,et al.  Loop Quantization: an Analysis and Algorithm , 1987 .

[2]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[3]  Edwin H.-M. Sha,et al.  Nested Loop Transformation for Full Parallelism , 1994 .

[4]  Edwin Hsing-Mean Sha,et al.  Loop Pipelining for Scheduling Multi-Dimensional Systems via Rotation , 1994, 31st Design Automation Conference.

[5]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[6]  Edwin Hsing-Mean Sha,et al.  Schedule-based multi-dimensional retiming on data flow graphs , 1994, Proceedings of 8th International Parallel Processing Symposium.

[7]  Edwin Hsing-Mean Sha,et al.  Static scheduling of uniform nested loops , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.