Automatic data allocation to minimize communication on SIMD machines

Straightforward compilation of array operations onto massively parallel SIMD machines results in a significant amount of interprocessor data motion. Careful allocation of data across the processors eliminates much of this interprocessor data motion. Researchers are working on extending programming languages to include user directives for specifying good data allocation. Our focus is to automate the data allocation through compiler techniques to achieve portability without sacrificing efficiency. These techniques can be used to fully automate the data allocation process or can be integrated with alignment directives.We present here a complete compiler algorithm for the automatic layout of data to minimize interprocessor data motion. Arrays are aligned by mapping them onto the processors based on their usage. Arrays may be mapped differently in different sections of the program, eliminating much of the interprocessor data motion resulting from a static mapping of arrays. We describe an integrated technique for determining the alignment of arrays locally within regions of the program and minimizing communication globally among these regions. This technique starts with the alignments specified by the directives, if any, and determines the alignment for the remaining arrays.The algorithms proposed in this paper were used in the SIMD compilers at Compass, Inc. Preliminary results from the initial implementation of the data optimization techniques described here suggest a significant decrease of the interprocessor data motion. More analysis is required to better understand the range of expected gains and the conditions under which those gains are achieved.

[1]  Guy L. Steele,et al.  Massively parallel data optimization , 1988, Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation.

[2]  Ken Kennedy,et al.  Compiling programs for distributed-memory multiprocessors , 2004, The Journal of Supercomputing.

[3]  Guy L. Steele,et al.  Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines , 1990, J. Parallel Distributed Comput..

[4]  Thomas R. Gross,et al.  Structured dataflow analysis for arrays and its use in an optimizing compiler , 1990, Softw. Pract. Exp..

[5]  Micha Sharir,et al.  Structural Analysis: A New Approach to Flow Analysis in Optimizing Compilers , 2015 .

[6]  Ken Kennedy,et al.  Automatic Data Layout for Distributed-Memory Machines in the D Programming Environment , 1994, Automatic Parallelization.

[7]  K. Timson,et al.  Center for research on parallel computation , 1992 .

[8]  Thinking Machines Getting started in cm-fortran , 1990 .

[9]  Jan F. Ens A Framework for Efficient Execution of Array-Based Languages on SIMD Computers? , 1990 .

[10]  Michael Weiss Strip mining on SIMD architectures , 1991, ICS '91.

[11]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[12]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[13]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.