A reduced overhead replacement policy for Chip Multiprocessors having victim retention

Due to the non-uniform distribution of the memory accesses for today's applications some sets of the cache are heavily used while some other sets remain underutilized. CMP-VR is an approach to dynamically increase the associativity of heavily used sets without increasing the cache size. It achieves this by reserving certain number of ways in each set to be shared with other sets and the remaining are private to the set. These shared ways from all sets form common reserve storage, while the private ways form the normal storage. In both the partitions it uses LRU replacement policy. This paper presents an optimization on CMP-VR by removing the LRU policy from the normal storage of the set. A victim from this normal storage can reside in the reserved/shared area and will get evicted from here using the LRU policy. Thus our optimization does not hamper cache performance. At the same time it helps to remove the complexity of implementing true LRU. Storage analysis shows 7-18% reduction in the replacement cost. CPI and miss rate also improve by 4% and 16% respectively for a 4MB 8 way associative LLC.

[1]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[2]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[3]  Jaehyuk Huh,et al.  A NUCA substrate for flexible CMP cache sharing , 2005, ICS.

[4]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[5]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[6]  Byeong Kil Lee,et al.  Fixed Segmented LRU cache replacement scheme with selective caching , 2012, 2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC).

[7]  Fang Juan,et al.  An Improved Multi-core Shared Cache Replacement Algorithm , 2012, 2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science.

[8]  Jean-Loup Baer,et al.  Modified LRU policies for improving second-level cache behavior , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[9]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[10]  Yan Solihin,et al.  Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.

[11]  Aviral Shrivastava,et al.  LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches , 2011, 2011 24th Internatioal Conference on VLSI Design.

[12]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[13]  M. Zahran Cache Replacement Policy Revisited , 2022 .

[14]  Onur Mutlu,et al.  A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[15]  Jaehyuk Huh,et al.  A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[16]  Philippe Robert,et al.  A versatile and accurate approximation for LRU cache performance , 2012, 2012 24th International Teletraffic Congress (ITC 24).

[17]  Norman P. Jouppi,et al.  Multi-Core Cache Hierarchies , 2011, Multi-Core Cache Hierarchies.

[18]  Shirshendu Das,et al.  Random-LRU: A Replacement Policy for Chip Multiprocessors , 2013, VDAT.

[19]  Christoforos E. Kozyrakis,et al.  The ZCache: Decoupling Ways and Associativity , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[20]  Shirshendu Das,et al.  Victim retention for reducing cache misses in tiled chip multiprocessors , 2014, Microprocess. Microsystems.

[21]  Francisco J. Cazorla,et al.  Adapting cache partitioning algorithms to pseudo-LRU replacement policies , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[22]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[23]  Yale N. Patt,et al.  The V-Way cache: demand-based associativity via global replacement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).