论文信息 - Modeling Cache Sharing on Chip Multiprocessor Architectures

Modeling Cache Sharing on Chip Multiprocessor Architectures

As CMPs are emerging as the dominant architecture for a wide range of platforms (from embedded systems and game consoles, to PCs, and to servers) the need to manage on-chip resources, such as shared caches, becomes a necessity. In this paper we propose a new statistical model of a CMP shared cache which not only describes cache sharing but also its management via a novel fine-grain mechanism. Our model, called StatShare, accurately describes the behavior of the sharing threads using run-time information (reuse-distance information for memory accesses) and helps us understand how effectively each thread uses its space. The mechanism to manage the cache at the cache-line granularity is inspired by cache decay, but contains important differences. Decayed cache-lines are not turned-off to save leakage but are rather "available for replacement." Decay modifies the underlying replacement policy (random, LRU) to control sharing but in a very flexible and non-strict way which makes it superior to strict cache partitioning schemes (both fine and coarse grained). The statistical model allows us to assess a thread's cache behavior under decay. Detailed CMP simulations show that: i) StatShare accurately predicts the thread behavior in a shared cache, ii) managing sharing via decay (in combination with the StatShare run time information) can be used to enforce external QoS requirements or various high-level fairness policies

[1] Erik Hagersten,et al. A statistical multiprocessor cache model , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[2] Mateo Valero,et al. A Simulator for SMT Architectures: Evaluating Instruction Cache Topologies , 2000 .

[3] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[4] Yan Solihin,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[5] Mahmut T. Kandemir,et al. Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[6] Erik Hagersten,et al. TImestamp-based Selective Cache Allocation , 2003 .

[7] Dean M. Tullsen,et al. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[8] Yan Solihin,et al. Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[9] G. Edward Suh,et al. A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[10] Margaret Martonosi,et al. Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[11] Erik Hagersten,et al. Fast data-locality profiling of native execution , 2005, SIGMETRICS '05.

[12] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[13] Glenn Reinman,et al. Fast and fair: data-stream quality of service , 2005, CASES '05.

[14] Brad Calder,et al. Phase tracking and prediction , 2003, ISCA '03.

[15] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.