Automated cache optimizations using CME driven diagnosis

Demonstrating our framework on a collection of scientific loop nests, we were able to reduce an average of 84% of cache misses in the optimizable loop nests. This work lays the groundwork for handling a wide range of optimizations through further study of solution patterns in the CME solution table.

[1]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[2]  Josep Llosa,et al.  A fast implementation of cache miss equations , 2000 .

[3]  K. Kennedy,et al.  Compiler Blockability of Numerical Algorithms Compiler Blockability of Numerical Algorithms , 1992 .

[4]  Michael F. P. O'Boyle,et al.  Efficient parallelisation using combined loop and data transformations , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[5]  Olivier Temam,et al.  A quantitative analysis of loop nest locality , 1996, ASPLOS VII.

[6]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[7]  Michael Wolfe,et al.  Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.

[8]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[9]  Olivier Temam,et al.  Cache interference phenomena , 1994, SIGMETRICS.

[10]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  David Bernstein,et al.  Compiler techniques for data prefetching on the PowerPC , 1995, PACT.

[13]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[14]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[15]  Vivek Sarkar,et al.  A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness , 1994, CASCON.

[16]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[17]  Vivek Sarkar,et al.  On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.

[18]  Ken Kennedy,et al.  Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.

[19]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[20]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[21]  Sharad Malik,et al.  Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.

[22]  Somnath Ghosh,et al.  Cache Miss Equations: Compiler Analysis Framework for Tuning Memory Behavior , 2001, PPSC.

[23]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[24]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[25]  Mahmut T. Kandemir,et al.  Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.