Improving Pelifo Cache Replacement Policy: Hardware Reduction and Thread-Aware Extension

Studying blocks behavior during their lifetime in cache can provide useful information to reduce the miss rate and therefore improve processor performance. According to this rationale, the peLIFO replacement algorithm [M. Chaudhuri, Proc. Micro'09, New York, 12–16 December, 2009, pp. 401–412], which learns dynamically the number of cache ways required to satisfy short-term reuses preserving the remaining ways for long-term reuses, has been recently proposed. In this paper, we propose several changes to the original peLIFO policy in order to reduce the implementation complexity involved, and we extend the algorithm to a shared-cache environment considering dynamic information about threads behavior to improve cache efficiency. Experimental results confirm that our simplification techniques reduce the required hardware with a negligible performance penalty, while the best of our thread-aware extension proposals reduces average CPI by 8.7% and 15.2% on average compared to the original peLIFO and LRU respectively for a set of 43 multi-programmed workloads on an 8 MB 16-way set associative shared L2 cache.

[1]  Mazen Kharbutli,et al.  LACS: A Locality-Aware Cost-Sensitive Cache Replacement Algorithm , 2014, IEEE Transactions on Computers.

[2]  Aamer Jaleel,et al.  The gradient-based cache partitioning algorithm , 2012, TACO.

[3]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[4]  B. Jacob,et al.  CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .

[5]  José González,et al.  Speculative execution via address prediction and data prefetching , 1997, ICS '97.

[6]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[7]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[8]  Víctor Viñals,et al.  Exploiting reuse locality on inclusive shared last-level caches , 2013, TACO.

[9]  Mainak Chaudhuri,et al.  Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[11]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[12]  Onur Mutlu,et al.  A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[13]  Hassan Ghasemzadeh,et al.  Modified pseudo LRU replacement algorithm , 2006, 13th Annual IEEE International Symposium and Workshop on Engineering of Computer-Based Systems (ECBS'06).

[14]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[15]  Aamer Jaleel,et al.  CRUISE: cache replacement and utility-aware scheduling , 2012, ASPLOS XVII.

[16]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[17]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[18]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[19]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[20]  Carole-Jean Wu,et al.  SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).