A Comparison of Compiler Tiling Algorithms

Linear algebra codes contain data locality which can be exploited by tiling multiple loop nests. Several approaches to tiling have been suggested for avoiding conflict misses in low associativity caches. We propose a new technique based on intra-variable padding and compare its performance with existing techniques. Results show padding improves performance of matrix multiply by over 100% in some cases over a range of matrix sizes. Comparing the efficacy of different tiling algorithms, we discover rectangular tiles are slightly more efficient than square tiles. Overall, tiling improves performance from 0-250%. Copying tiles at run time proves to be quite effective.

[1]  Vivek Sarkar,et al.  On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.

[2]  Ken Kennedy,et al.  Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.

[3]  Sharad Malik,et al.  Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.

[4]  Olivier Temam,et al.  To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93. Proceedings.

[5]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[6]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[7]  Karim Esseghir Improving data locality for caches , 1993 .

[8]  David H. Bailey Unfavorable Strides in Cache Memory Systems (RNR Technical Report RNR-92-015) , 1995, Sci. Program..

[9]  David H. Bailey Unfavorable strides in cache memory systems , 1992 .

[10]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[11]  Ken Kennedy,et al.  Improving register allocation for subscripted variables , 1990, PLDI '90.

[12]  Vivek Sarkar,et al.  A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness , 1994, CASCON.

[13]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[14]  Olivier Temam,et al.  Cache interference phenomena , 1994, SIGMETRICS.

[15]  D LamMonica,et al.  The cache performance and optimizations of blocked algorithms , 1991 .

[16]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[17]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[18]  Chau-Wen Tseng,et al.  Eliminating conflict misses for high performance architectures , 1998, ICS '98.

[19]  Mahmut T. Kandemir,et al.  A compiler algorithm for optimizing locality in loop nests , 1997, ICS '97.

[20]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[21]  Olivier Temam,et al.  A quantitative analysis of loop nest locality , 1996, ASPLOS VII.

[22]  Tarek S. Abdelrahman,et al.  Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..

[23]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[24]  W. Jalby,et al.  To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.

[25]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[26]  Michael F. P. O'Boyle,et al.  Non-singular data transformations: definition, validity and applications , 1997, ICS '97.