Data alignment of loop nests without nonlocal communications

In this paper, how to distribute data to different memory modules and how to distribute computations to different processors for execution in a distributed memory parallel computer without nonlocal communications or with minimum nonlocal communications are addressed. Nonlocal communications are much more expensive compared to local communications, e.g., nearest neighbor shifts of data. Algorithms are classified to uniform communication algorithms where communication patterns are regular and affine communication algorithms where communication patterns are affine functions of loop index variables. Necessary and sufficient conditions on the existence of mappings without nonlocal communications are presented. If such mappings exist, constraints posed by the mapping without nonlocal communications are constructed and used to guide the selection of mappings with minimum local communications.<<ETX>>

[1]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[2]  Manish Gupta,et al.  Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers , 1992, IEEE Trans. Parallel Distributed Syst..

[3]  Marina C. Chen,et al.  The Data Alignment Phase in Compiling Programs for Distrubuted-Memory Machines , 1991, J. Parallel Distributed Comput..

[4]  Yves Robert,et al.  Communication-minimal mapping of uniform loop nests onto distributed memory architectures , 1993, Proceedings of International Conference on Application Specific Array Processors (ASAP '93).

[5]  J. R. Gilbert,et al.  Mobile and replicated alignment of arrays in data-parallel programs , 1993, Supercomputing '93. Proceedings.

[6]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[7]  Marina C. Chen,et al.  Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[8]  Z. Chen,et al.  On uniformization of affine dependence algorithms , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[9]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.