The alignment problem in a linear algebra framework

Two important aspects have to be addressed when automatically parallelizing loop nests for massively parallel distributed memory computers, namely maximizing parallelism and minimizing communication overhead due to nonlocal data accesses. This paper studies the problem of finding a computation mapping and data distributions that minimize the number of remote data accesses for a given degree of parallelism. This problem is called the constant-degree parallelism alignment problem and is shown to be NP-hard. The algorithm presented uses a linear algebra framework and assumes affine data access functions. It proceeds by enumerating all interesting bases of the set of vectors representing the alignments between computation and data accesses that should be satisfied. It is shown in a comparison with related work how the approach presented allows one to express previous results as special cases. The algorithm is applied to benchmark programs and is shown to be superior to more basic mappings.

[1]  P. Sadayappan,et al.  Communication-Free Hyperplane Partitioning of Nested Loops , 1993, J. Parallel Distributed Comput..

[2]  John R. Gilbert,et al.  The Alignment-Distribution Graph , 1993, LCPC.

[3]  Ronald H. Perrott,et al.  An automatic data distribution generator for distributed memory machines , 1998, Concurrency Practice and Experience.

[4]  Yves Robert,et al.  Mapping affine loop nests: new results , 1995, HPCN Europe.

[5]  Alexis Platonoff Automatic Data Distribution for Massively Parallel Computers , 1995 .

[6]  Guy L. Steele,et al.  Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines , 1990, J. Parallel Distributed Comput..

[7]  Yves Robert,et al.  Mapping Uniform Loop Nests Onto Distributed Memory Architectures , 1993, Parallel Comput..

[8]  Keshav Pingali,et al.  Solving Alignment Using Elementary Linear Algebra , 1994, LCPC.

[9]  Marc Gengler,et al.  Solving the Constant-Degree Parallelism Alginment Problem , 1996, Euro-Par, Vol. I.

[10]  Boleslaw K. Szymanski,et al.  Data and Task Alignment in Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..

[11]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[12]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[13]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[14]  Paul Feautrier Toward Automatic Distribution , 1994, Parallel Process. Lett..

[15]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[16]  Monica S. Lam,et al.  Communication-Free Parallelization via Affine Transformations , 1994, LCPC.

[17]  Monica S. Lam,et al.  Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[18]  P. Sadayappan,et al.  Communication-Free Hyperplane Partitioning of Nested Loops , 1991, LCPC.

[19]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[20]  Edoardo Amaldi,et al.  The Complexity and Approximability of Finding Maximum Feasible Subsystems of Linear Relations , 1995, Theor. Comput. Sci..