Cache Management with Partitioning-Aware Eviction and Thread-Aware Insertion/Promotion Policy

With recent advances of processor technology, the LRU based shared last-level cache (LLC) has been widely employed in modern Chip Multi-processors (CMP). However, past research [1,2,8,9] indicates that the cache performance of the LLC and further of the CMP processors may be degraded severely by LRU under the occurrence of the inter-thread interference or the excess of the working set size over the cache size. Existing approaches tackling this performance degradation problem have limited improvement of an overall cache performance because they usually focus on a single type of memory access behavior and thus lack full consideration of tradeoffs among different types of memory access behaviors. In this paper, we propose a unified cache management policy called Partitioning-Aware Eviction and Thread-aware Insertion/Promotion policy (PAE-TIP) that can effectively enhance capacity management, adaptive insertion/promotion, and further improve the overall cache performance. Specifically, PAE-TIP employs an adaptive mechanism to decide the position where to put the incoming lines or to move the hit lines, and chooses a victim line based on the target partitioning given by utility-based cache partitioning (UCP) [2]. In our study, we show that PAE-TIP can cover a variety of memory access behaviors simultaneously and provide a good tradeoff for overall cache performance improvement while retaining competitively low hardware and design overhead. The evaluation conducted on 4-way CMPs shows that the PAE-TIP-managed LLC can improve overall performance by19.3% on average over the LRU policy. Furthermore, the performance benefit of PAE-TIP is 1.09x compared to PIPP, 1.11x compared to TADIP and 1.12x compared to UCP.

[1]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[2]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[3]  David A. Wood,et al.  ASR: Adaptive Selective Replication for CMP Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[4]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  W. Marsden I and J , 2012 .

[6]  Per Stenström,et al.  An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[7]  Yan Solihin,et al.  Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.

[8]  Babak Falsafi,et al.  Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.

[9]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[10]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[12]  Gabriel H. Loh,et al.  Double-DIP: Augmenting DIP with Adaptive Promotion Policies to Manage Shared L2 Caches , 2008 .

[13]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[14]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[15]  Jichuan Chang,et al.  Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[16]  Pedro López,et al.  Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[17]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[18]  Srihari Makineni,et al.  Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Jaehyuk Huh,et al.  Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[20]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[21]  John Turek,et al.  Optimal Partitioning of Cache Memory , 1992, IEEE Trans. Computers.

[22]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.