A Compiler Framework for Tiling Imperfectly-Nested Loops

This paper presents an integrated compiler framework for tiling a class of nontrivial imperfectly-nested loops such that cache locality is improved. We develop a new memory cost model to analyze data reuse in terms of both the cache and the TLB, based on which we compute the tile size with or without array duplication. We determine whether to duplicate arrays for tiling by comparing the respective exploited reuse factors. The preliminary results with several benchmark programs show that the transformed programs achieve a speedup of 1.09 to 3.82 over the original programs.

[1]  Chau-Wen Tseng,et al.  Eliminating conflict misses for high performance architectures , 1998, ICS '98.

[2]  Tarek S. Abdelrahman,et al.  Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..

[3]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[4]  Zhiyuan Li,et al.  Interprocedural Analysis for Loop Scheduling and Data Allocation , 1998, Parallel Comput..

[5]  Vivek Sarkar,et al.  A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness , 1994, CASCON.

[6]  Olivier Temam,et al.  Cache interference phenomena , 1994, SIGMETRICS.

[7]  Keshav Pingali,et al.  Data-centric multi-level blocking , 1997, PLDI '97.

[8]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[9]  John D. McCalpin,et al.  Time Skewing: A Value-Based Approach to Optimizing for Memory Locality , 1999 .

[10]  Vivek Sarkar,et al.  On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.

[11]  Sharad Malik,et al.  Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.

[12]  Keshav Pingali,et al.  Transformations for Imperfectly Nested Loops , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[13]  Zhiyuan Li,et al.  Experience with efficient array data flow analysis for array privatization , 1997, PPOPP '97.

[14]  Zhiyuan Li,et al.  New tiling techniques to improve cache temporal locality , 1999, PLDI '99.