IMPACT OF TILE-SIZE SELECTION FOR SKEWED TILING

Tile-size selection is known to be a complex problem. This paper develops a new selection algorithm targeting relaxation codes. Unlike previous algorithms, this new algorithm considers the effect of loop skewing, which is necessary to tile such codes. It also estimates loop overhead and incorporates them into the execution cost model, which turns out to be critical to the decision between tiling a single loop level vs. tiling two loop levels. Our preliminary experimental results show a significant impact of these previously ignored issues on the execution time of tiled loops in relaxation codes. In our experiments, we measured the cache miss rate and the execution time of five benchmark programs on a single processor and we compared our algorithm with previous algorithms. Our algorithm achieves an average speedup of 1.27 to 1.63 over all the other algorithms.

[1]  Sharad Malik,et al.  Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.

[2]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[3]  Vicki H. Allan,et al.  Software pipelining , 1995, CSUR.

[4]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[5]  Jacqueline Chame,et al.  A tile selection algorithm for data locality and cache interference , 1999, ICS '99.

[6]  S. K. Park,et al.  Random number generators: good ones are hard to find , 1988, CACM.

[7]  Hiroshi Nakamura,et al.  Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.

[8]  Keshav Pingali,et al.  Data-centric multi-level blocking , 1997, PLDI '97.

[9]  Chau-Wen Tseng,et al.  Eliminating conflict misses for high performance architectures , 1998, ICS '98.

[10]  Chau-Wen Tseng,et al.  A Comparison of Compiler Tiling Algorithms , 1999, CC.

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  Zhiyuan Li,et al.  New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[13]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[14]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[15]  Ken Kennedy,et al.  Improving register allocation for subscripted variables , 1990, PLDI '90.

[16]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[17]  Zhiyuan Li,et al.  A Compiler Framework for Tiling Imperfectly-Nested Loops , 1999, LCPC.

[18]  Olivier Temam,et al.  Cache interference phenomena , 1994, SIGMETRICS.

[19]  Tarek S. Abdelrahman,et al.  Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..

[20]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[21]  Vivek Sarkar,et al.  On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.