Timekeeping in the memory system: predicting and optimizing memory behavior

Techniques for analyzing and improving memory referencing behavior continue to be important for achieving good overall program performance due to the ever-increasing performance gap between processors and main memory. This paper offers a fresh perspective on the problem of predicting and optimizing memory behavior. Namely, we show quantitatively the extent to which detailed timing characteristics of past memory reference events are strongly predictive of future program reference behavior. We propose a family of time-keeping techniques that optimize behavior based on observations about particular cache time durations, such as the cache access interval or the cache dead time. Timekeeping techniques can be used to build small simple, and high-accuracy (often 90% or more) predictors for identifying conflict misses, for predicting dead blocks, and even for estimating the time at which the next reference to a cache frame will occur and the address that will be accessed. Based on these predictors, we demonstrate two new and complementary time-based hardware structures: (1) a time-based victim cache that improves performance by only storing conflict miss lines with likely reuse, and (2) a time-based prefetching technique that hones in on the right address to prefetch, and the right time to schedule the prefetch. Our victim cache technique improves performance over previous proposals by better selections of what to place in the victim cache. Our prefetching technique outperforms similar prior hardware prefetching proposals, despite being orders of magnitude smaller. Overall, these techniques improve performance by more than 11% across the SPEC2000 benchmark suite.

[1]  Dhiraj K. Pradhan,et al.  Modeling Live and Dead Lines in Cache Memory Systems , 1993, IEEE Trans. Computers.

[2]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[3]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[4]  Gary S. Tyson,et al.  Active Management of Data Caches by Exploiting Reuse Information , 1999, IEEE Trans. Computers.

[5]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[6]  Alvin R. Lebeck,et al.  Load latency tolerance in dynamically scheduled processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[7]  Eric Rotenberg,et al.  Adaptive mode control: A static-power-efficient cache design , 2003, TECS.

[8]  Thomas Roberts Puzak,et al.  Analysis of cache replacement-algorithms , 1985 .

[9]  Babak Falsafi,et al.  Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.

[10]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[11]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[12]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[13]  Dean M. Tullsen,et al.  Hardware identification of cache conflict misses , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[14]  Wei-Chung Hsu,et al.  Data Prefetching On The HP PA-8000 , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[15]  Dirk Grunwald,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[16]  Alan Jay Smith,et al.  Aspects of cache memory and instruction buffer performance , 1987 .

[17]  Chris Wilkerson,et al.  Locality vs. criticality , 2001, ISCA 2001.

[18]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[19]  David A. Wood,et al.  A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.

[20]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[21]  Alvin M. Despain,et al.  Cache designs for energy efficiency , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.