SCIP: Selective cache insertion and bypassing to improve the performance of last-level caches

The design of an effective last-level cache (LLC) is crucial to the overall processor performance and, consequently, continues to be the center of substantial research. Unfortunately, LLCs in modern high-performance processors are not used efficiently. One major problem suffered by LLCs is their low hit rates caused by the large fraction of cache blocks that do not get re-accessed after being brought into the LLC following a cache miss. These blocks do not contribute any cache hits and usually induce cache pollution and thrashing. Cache bypassing presents an effective solution to this problem. Cache blocks that are predicted not to be accessed while residing in the cache are not inserted into the LLC following a miss, instead they bypass the LLC and are only inserted in the higher cache levels. This paper presents a simple, low-hardware overhead, yet effective, cache bypassing algorithm that dynamically chooses which blocks to insert into the LLC and which to bypass it following a miss based on past access/bypass patterns. Our proposed algorithm is thoroughly evaluated using a detailed simulation environment where its effectiveness, performance-improvement capabilities, and robustness are demonstrated. Moreover, it is shown to outperform the state-of-the-art cache bypassing algorithm in both a uniprocessor and a multi-core processor settings.

[1]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[2]  Per Stenström,et al.  Enhancing Last-Level Cache Performance by Block Bypassing and Early Miss Determination , 2006, Asia-Pacific Computer Systems Architecture Conference.

[3]  Mazen Kharbutli,et al.  Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms , 2010, 2010 IEEE International Conference on Computer Design.

[4]  Mazen Kharbutli,et al.  LACS: A Locality-Aware Cost-Sensitive Cache Replacement Algorithm , 2014, IEEE Transactions on Computers.

[5]  Xu Cheng,et al.  Optimal bypass monitor for high performance last-level caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Mainak Chaudhuri,et al.  Introducing Hierarchy-awareness in replacement and bypass algorithms for last-level caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Mainak Chaudhuri,et al.  Bypass and insertion algorithms for exclusive last-level caches , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  Josep Torrellas,et al.  A direct-execution framework for fast and accurate simulation of superscalar processors , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[10]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[11]  Wen-mei W. Hwu,et al.  Run-Time Cache Bypassing , 1999, IEEE Trans. Computers.

[12]  Onur Mutlu,et al.  The evicted-address filter: A unified mechanism to address both cache pollution and thrashing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[13]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[14]  Carole-Jean Wu,et al.  SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Saurabh Gupta,et al.  Adaptive Cache Bypassing for Inclusive Last Level Caches , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[16]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[17]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[18]  Sally A. McKee,et al.  Global management of cache hierarchies , 2010, CF '10.

[19]  André Seznec,et al.  Exploiting Single-Usage for Effective Memory Management , 2007, Asia-Pacific Computer Systems Architecture Conference.

[20]  Jaehyuk Huh,et al.  Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[21]  Jaejin Lee,et al.  Eliminating conflict misses using prime number-based cache indexing , 2005, IEEE Transactions on Computers.

[22]  Yan Solihin,et al.  Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.