Visualizing the impact of the cache on program execution

The global cache misses ratio of a program does not reveal the time distribution of the memory reference patterns in detail. On the other hand, cache visualization is hampered by the huge amount of memory references to display. Therefore, many visualizers focus on a snapshot of the cache content, instead of viewing all memory transactions. A cache visualizer is introduced which presents the integral cache behavior of a program in several complementary views: the density view of the cache misses shows the hot spots of the program; the reuse distances view shows the data locality and its effect on performance; the histogram view shows the periodical patterns that occurs in the trace. In a number of experiments, the visualizer has been used to characterize the cache behavior and effectively improve the cache behavior and program performance.

[1]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[2]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[3]  Olivier Temam,et al.  Quantifying loop nest locality using SPEC'95 and the perfect benchmarks , 1999, TOCS.

[4]  Sharad Malik,et al.  Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.

[5]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[6]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[7]  Marc Atkins,et al.  PC Software Performance Tuning , 1996, Computer.

[8]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[9]  Trevor N. Mudge,et al.  Trace-driven memory simulation: a survey , 1997, CSUR.

[10]  Elana D. Granston,et al.  A Cache Visualization Tool , 1997, Computer.

[11]  Graham R. Nudd,et al.  Analytical Modeling of Set-Associative Cache Behavior , 1999, IEEE Trans. Computers.

[12]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[13]  Pat Hanrahan,et al.  Rivet: a flexible environment for computer systems visualization , 2000, SIGGRAPH 2000.

[14]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[15]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[16]  Qi Wang,et al.  The FORTRAN Parallel Transformer and its Programming , 1998, Inf. Sci..

[17]  Alan L. Cox,et al.  Parallel Programming Tools , 1999 .