Blocking and array contraction across arbitrarily nested loops using affine partitioning

Applicable to arbitrary sequences and nests of loops, affine partitioning is a program transformation framework that unifies many previously proposed loop transformations, including unimodular transforms, fusion, fission, reindexing, scaling and statement reordering. Algorithms based on affine partitioning have been shown to be effective for parallelization and communication minimization. This paper presents algorithms that improve data locality using affine partitioning. Blocking and array contraction are two important optimizations that have been shown to be useful for data locality. Blocking creates a set of inner loops so that data brought into the faster levels of the memory hierarchy can be reused. Array contraction reduces an array to a scalar variable and thereby reduces the number of memory operations executed and the memory footprint. Loop transforms are often necessary to make blocking and array contraction possible. By bringing the full generality of affine partitioning to bear on the problem, our locality algorithm can find more contractable arrays than previously possible. This paper also generalizes the concept of blocking and shows that affine partitioning allows the benefits of blocking be realized in arbitrarily nested loops. Experimental results on a number of benchmarks and a complete multigrid application in aeronautics indicates that affine partitioning is effective in practice.

[1]  William Pugh,et al.  Iteration Space Slicing for Locality , 1999, LCPC.

[2]  TimePaul FeautrierLaboratoire Masi Some Eecient Solutions to the Aane Scheduling Problem Part I One-dimensional Time , 1993 .

[3]  Monica S. Lam,et al.  An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.

[4]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[5]  Vivek Sarkar,et al.  Optimization of array accesses by collective loop transformations , 1991, ICS '91.

[6]  A. Jameson Solution of the Euler equations for two dimensional transonic flow by a multigrid method , 1983 .

[7]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[8]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[9]  Amer Diwan,et al.  SUIF Explorer: an interactive and interprocedural parallelizer , 1999, PPoPP '99.

[10]  Lawrence Snyder,et al.  The implementation and evaluation of fusion and contraction in array languages , 1998, PLDI '98.

[11]  David H. Bailey,et al.  The NAS kernel benchmark program , 1985 .

[12]  Monica S. Lam,et al.  Maximizing Parallelism and Minimizing Synchronization with Affine Partitions , 1998, Parallel Comput..

[13]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[14]  Ken Kennedy,et al.  Optimizing for parallelism and data locality , 1992 .

[15]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[16]  Keshav Pingali,et al.  Tiling Imperfectly-nested Loop Nests , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[17]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[18]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[19]  V. Sarkar,et al.  Collective Loop Fusion for Array Contraction , 1992, LCPC.

[20]  Keshav Pingali,et al.  Data-centric multi-level blocking , 1997, PLDI '97.

[21]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.