Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

We present an approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of every statement in a loop nest into a special iteration space called the product space. The product space can be viewed as a perfectly-nested loop nest, so embedding generalizes techniques like code sinking and loop fusion that are used in ad hoc ways in current compilers to produce perfectly-nested loops from imperfectly-nested ones. In contrast to these ad hoc techniques however, our embeddings are chosen carefully to enhance locality. The product space is then transformed further to enhance locality, after which fully permutable loops are tiled, and code is generated. We evaluate the effectiveness of this approach for dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks.

[1]  Keshav Pingali,et al.  Compiling Imperfectly-nested Sparse Matrix Codes with Dependences , 2000 .

[2]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[3]  Fred G. Gustavson,et al.  Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..

[4]  Ken Kennedy,et al.  Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.

[5]  Keshav Pingali,et al.  Next-generation generic programming and its application to sparse matrix computations , 2000, ICS '00.

[6]  Sharad Malik,et al.  Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.

[7]  William Pugh,et al.  Iteration Space Slicing for Locality , 1999, LCPC.

[8]  Keshav Pingali,et al.  A Framework for Sparse Matrix Code Synthesis from High-level Specifications , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[9]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[10]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[11]  Yves Robert,et al.  (Pen)-ultimate tiling? , 1994, Integr..

[12]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[13]  J. Ramanujam,et al.  Tiling Multidimensional Itertion Spaces for Multicomputers , 1992, J. Parallel Distributed Comput..

[14]  William Pugh,et al.  Finding Legal Reordering Transformations Using Mappings , 1994, LCPC.

[15]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[16]  Keshav Pingali,et al.  Data-centric multi-level blocking , 1997, PLDI '97.

[17]  Ken Kennedy,et al.  Transforming loops to recursion for multi-level memory hierarchies , 2000, PLDI '00.

[18]  Jordi Torres,et al.  Partitioning the statement per iteration space using non-singular matrices , 1993, ICS '93.

[19]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[20]  Ken Kennedy,et al.  Optimizing for parallelism and data locality , 1992, ICS '92.

[21]  Keshav Pingali,et al.  Access normalization: loop restructuring for NUMA computers , 1993, TOCS.

[22]  Monica S. Lam,et al.  Maximizing Parallelism and Minimizing Synchronization with Affine Partitions , 1998, Parallel Comput..

[23]  William Pugh,et al.  Selecting Affine Mappings Based on Performance Estimation , 1994, Parallel Process. Lett..

[24]  Keshav Pingali,et al.  Left-Looking to Right-Looking and Vice Versa: An Application of Fractal Symbolic Analysis to Linear Algebra Code Restructuring , 2000, Euro-Par.

[25]  Zhiyuan Li,et al.  New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[26]  Utpal Banerjee,et al.  A theory of loop permutations , 1990 .

[27]  Keshav Pingali,et al.  An experimental evaluation of tiling and shackling for memory hierarchy management , 1999, ICS '99.

[28]  Gene H. Golub,et al.  Matrix computations , 1983 .

[29]  Jack Dongarra,et al.  Automatic Blocking of Nested Loops , 1990 .

[30]  William Pugh,et al.  Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.

[31]  Keshav Pingali,et al.  Access normalization: loop restructuring for NUMA compilers , 1992, ASPLOS V.

[32]  Keshav Pingali,et al.  Automatic Generation of Block-Recursive Codes , 2000, Euro-Par.