Mappings for communication minimization using distribution and alignment

This paper introduces results on communications minimization for systems of aane recurrence equations. We show how the dependences of such a system can be decomposed into two classes : the auto-dependences and the crossed-dependences. We then present how the localization optimization can be achieved in two steps : a distribution of the auto-dependences and an alignment of the crossed-dependences. Since localization of all the dependences of a problem is generally not possible, we nally introduce a heuristic to globally minimize the communications.

[1]  Weijia Shang,et al.  Data alignment of loop nests without nonlocal communications , 1994, Proceedings of IEEE International Conference on Application Specific Array Processors (ASSAP'94).

[2]  Michael L. Dowling Optimal code parallelization using unimodular transformations , 1990, Parallel Comput..

[3]  J. Ramanujam,et al.  Non-unimodular transformations of nested loops , 1992, Proceedings Supercomputing '92.

[4]  J. Ramanujam,et al.  Compile-Time Techniques for Data Distribution in Distributed Memory Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[5]  P. Sadayappan,et al.  Communication-Free Hyperplane Partitioning of Nested Loops , 1993, J. Parallel Distributed Comput..

[6]  Rudolf Eigenmann,et al.  Automatic program parallelization , 1993, Proc. IEEE.

[7]  Catherine Mongenet,et al.  Data compiling for systems of affine recurrence equations , 1994, Proceedings of IEEE International Conference on Application Specific Array Processors (ASSAP'94).

[8]  Michael Gerndt,et al.  SUPERB: A tool for semi-automatic MIMD/SIMD parallelization , 1988, Parallel Comput..

[9]  Erik H. D'Hollander,et al.  Partitioning and Labeling of Loops by Unimodular Transformations , 1992, IEEE Trans. Parallel Distributed Syst..

[10]  Catherine Mongenet,et al.  Data Compiling for Systems of Uniform Recurrence Equations , 1994, Parallel Process. Lett..

[11]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[12]  Saman Amarasinghe,et al.  The suif compiler for scalable parallel machines , 1995 .

[13]  Guy-René Perrin,et al.  Synthesis of Systolic arrays for Inductive Problems , 1987, PARLE.

[14]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[15]  Manish Gupta,et al.  A methodology for high-level synthesis of communication on multicomputers , 1992, ICS '92.