Communication-Free Alignment for Array References with Linear Subscripts in Three Loop Index Variables or Quadratic Subscripts

Bau et al. proposed an efficient and precise data alignment method to ascertain whether there is communication-free alignment of array reference function with linear subscripts in one loop index variable. Chu et al. presented an efficient and precise data alignment method to determine whether there is communication-free alignment of array reference function with linear subscripts in two loop index variables or quadratic subscripts (ai2+bi+d). However, for array reference function with linear subscripts in three loop index variables or quadratic subscripts (ai2+bi+cj+d), their methods cannot be applied. In this paper, we propose two new alignment functions for loop iteration space and array elements. The new alignment functions can be applied towards checking whether there is communication-free alignment of array reference function with linear subscripts in three loop index variables or quadratic subscripts. Experiments with benchmarks taken from Parallel loop and Vector loop showed that among the 7 nested loops tested, three of them had their data alignment improved by the method proposed.

[1]  Monica S. Lam,et al.  Maximizing Parallelism and Minimizing Synchronization with Affine Partitions , 1998, Parallel Comput..

[2]  Keshav Pingali,et al.  Solving Alignment Using Elementary Linear Algebra , 2001, Compiler Optimizations for Scalable Parallel Systems Languages.

[3]  Mahmut T. Kandemir,et al.  A hyperplane based approach for optimizing spatial locality in loop nests , 1998, ICS '98.

[4]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[5]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[6]  Yves Robert,et al.  Alignment and distribution is NOT (always) NP-hard , 1998, Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250).

[7]  Jang-Ping Sheu,et al.  Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers , 2004, The Journal of Supercomputing.

[8]  Mahmut T. Kandemir,et al.  A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality , 1998, LCPC.

[9]  Mahmut T. Kandemir,et al.  A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts , 1999, IEEE Trans. Parallel Distributed Syst..

[10]  Monica S. Lam,et al.  An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.

[11]  Jack J. Dongarra,et al.  Parallel loops - a test suite for parallelizing compilers: description and example results , 1991, Parallel Comput..

[12]  Jack J. Dongarra,et al.  A comparative study of automatic vectorizing compilers , 1991, Parallel Comput..

[13]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[14]  J. Edmonds Systems of distinct representatives and linear algebra , 1967 .