STATSHARE: A Statistical Model for Managing Cache Sharing via Decay

As CMPs are emerging as the dominant architecture for a wide range of platforms (from embedded systems and game consoles, to PCs, and to servers) the need to manage on-chip resources becomes a necessity. In this paper we examine the management of on-chip shared caches. Our paper offers two major contributions. First, we propose a new statistical model of a shared cache that can be fed with run-time information: reuse-distance information for thread accesses. Our model, called StatShare, accurately describes the behavior of the sharing threads, helps us understand which threads can be “compressed” into less space without perceptible damage, and how effectively each thread uses its space. Second, we propose a mechanism to manage the cache at a very fine level, at the cache-line granularity. Our mechanism is inspired by cache decay, but with some important differences. Decayed cache-lines are not turned-off to save leakage but rather they are “available for replacement.” Decay modifies the underlying replacement policy (random, LRU), to enforce our high-level policy decisions but in a very flexible and non-strict way. The statistical model allows us to assess a thread’s cache behavior under decay. Using this information we can then apply high-level policies such as policies that try to minimize the global miss-rate, or maximize the “usefulness” of the cache real estate, or even custom spaceallocation policies according to external QoS needs. To evaluate our approach we have implemented StatShare in a CMP simulator. Our results show that: i) managing sharing via decay outperforms coarse-grain partitioning schemes, ii) StatShare can yield run-time information to allow high-level policies to control decay.

[1]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[2]  Erik Hagersten,et al.  A statistical multiprocessor cache model , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[3]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[4]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[5]  Erik Hagersten,et al.  TImestamp-based Selective Cache Allocation , 2003 .

[6]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[7]  Yan Solihin,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[8]  Erik Hagersten,et al.  Fast data-locality profiling of native execution , 2005, SIGMETRICS '05.

[9]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[10]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[11]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.