A compiler algorithm for optimizing locality in loop nests

This paper describe8 an algorithm to optimize cache locality in scientific codes on uniprocessor and multiprocessor machines. A distinctive characteristic of our algorithm is that it considers loop and data layout transformations in a unified framework. We illustrate through example8 that our approach is very effective at reducing cache misses and tilesize sensitivity of blocked loop nests; and can optimize nests for which optimization technique8 based on loop transformations alone are not succe88ful. An important special ceze is the one in which data layouts of some arrays are fixed and cannot be changed. We show how our algorithm can handle this ca8e, and demonstrate how it can be used to optimize multiple loop nests.

[1]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[2]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[3]  Wei Li Compiler Optimizations for Cache Locality and Coherence , 1994 .

[4]  Henry G. Dietz,et al.  Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation , 1991, LCPC.

[5]  Mahmut T. Kandemir,et al.  A compiler algorithm for optimizing locality in loop nests , 1997, ICS '97.

[6]  Margaret Martonosi,et al.  Evaluating the impact of advanced memory systems on compiler-parallelized codes , 1995, PACT.

[7]  Wei Li Compiler Optimizations for Cache Locality and Coherence , 1994 .

[8]  Monica S. Lam,et al.  Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[9]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[10]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[11]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[12]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[13]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[14]  Chau-Wen Tseng,et al.  Unified compilation techniques for shared and distributed address space machines , 1995, ICS '95.

[15]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.