Cache Behavior Analysis Without Profiling.

The growing gap between processor and main memory speed makes it necessary to exploit the caches maximally in order to obtain reasonable program execution speed. Many program transformations have been proposed in order to make the cache behavior better. However, if one wants maximum effectivity from such optimizations, the cache behavior needs to be determined first. In contrast to profile-driven measurement of the cache behavior, this paper presents a method which derives it from the structure of the loops in the program. The lack of profiling makes the time needed to calculate the cache behavior independent of the programs input data. Furthermore, in contrast to other techniques which calculate cache behavior at compiler time, the presented technique is exact and is able to handle fully associative caches efficiently. The efficiency originates from the use of an intermediate data locality metric: the reuse distance. From the reuse distance, the hit/miss behavior of a memory access can be easily determined. Furthermore, the use of this analysis in a cache optimization phase in an EPIC-compiler for the Itanium processor is shown as an example of its practicality.

[1]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[2]  Siddhartha Chatterjee,et al.  Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.

[3]  Graham R. Nudd,et al.  Analytical Modeling of Set-Associative Cache Behavior , 1999, IEEE Trans. Computers.

[4]  Hiroshi Nakamura,et al.  Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.

[5]  Somnath Ghosh,et al.  Cache Miss Equations: Compiler Analysis Framework for Tuning Memory Behavior , 2001, PPSC.

[6]  Trevor N. Mudge,et al.  Trace-driven memory simulation: a survey , 1997, CSUR.

[7]  Larry Carter,et al.  Rescheduling for Locality in Sparse Matrix Computations , 2001, International Conference on Computational Science.

[8]  Kristof Beyls,et al.  Reuse Distance-Based Cache Hint Selection , 2002, Euro-Par.

[9]  William Pugh,et al.  Constraint-based array dependence analysis , 1998, TOPL.

[10]  Philippe Clauss,et al.  Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs , 1996 .

[11]  Fubo Zhang The FPT parallel programming environment , 1996 .

[12]  William Pugh,et al.  Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.

[13]  Yong Yan,et al.  Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP , 2000, IEEE Trans. Parallel Distributed Syst..

[14]  Kristof Beyls,et al.  Reuse Distance as a Metric for Cache Behavior. , 2001 .