The evicted-address filter: A unified mechanism to address both cache pollution and thrashing

Off-chip main memory has long been a bottleneck for system performance. With increasing memory pressure due to multiple onchip cores, effective cache utilization is important. In a system with limited cache space, we would ideally like to prevent 1) cache pollution, i.e., blocks with low reuse evicting blocks with high reuse from the cache, and 2) cache thrashing, i.e., blocks with high reuse evicting each other from the cache.

[1]  Ken Kennedy,et al.  Improving effective bandwidth through compiler enhancement of global cache reuse , 2004, J. Parallel Distributed Comput..

[2]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[3]  Abhishek Kumar,et al.  A New Design of Bloom Filter for Packet Inspection Speedup , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[4]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[5]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[6]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[7]  John Turek,et al.  Optimal Partitioning of Cache Memory , 1992, IEEE Trans. Computers.

[8]  Balaram Sinharoy,et al.  POWER7: IBM's next generation server processor , 2010, 2009 IEEE Hot Chips 21 Symposium (HCS).

[9]  Ashish Goel,et al.  Small subset queries and bloom filters using ternary associative memories, with applications , 2010, SIGMETRICS '10.

[10]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[11]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[12]  Wen-mei W. Hwu,et al.  Run-Time Cache Bypassing , 1999, IEEE Trans. Computers.

[13]  Yale N. Patt,et al.  The V-Way cache: demand-based associativity via global replacement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[14]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[15]  Ken Kennedy,et al.  Improving effective bandwidth through compiler enhancement of global cache reuse , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[16]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[17]  Dennis Shasha,et al.  2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm , 1994, VLDB.

[18]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[19]  Carole-Jean Wu,et al.  SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  André Seznec,et al.  Exploiting Single-Usage for Effective Memory Management , 2007, Asia-Pacific Computer Systems Architecture Conference.

[21]  Jaehyuk Huh,et al.  Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[22]  Mor Harchol-Balter,et al.  ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[23]  R. Govindarajan,et al.  Emulating Optimal Replacement with a Shepherd Cache , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[24]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[25]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[26]  Basilio B. Fraguela,et al.  Adaptive line placement with the set balancing cache , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[28]  David M. Brooks,et al.  The design of a bloom filter hardware accelerator for ultra low power systems , 2009, ISLPED.

[29]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[30]  Dean M. Tullsen,et al.  Hardware identification of cache conflict misses , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[31]  Arnold L. Rosenberg,et al.  Using the compiler to improve cache replacement decisions , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[32]  Gary S. Tyson,et al.  A modified approach to data cache management , 1995, MICRO 1995.

[33]  Steven K. Reinhardt,et al.  A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[34]  Onur Mutlu,et al.  A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[35]  Song Jiang,et al.  LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance , 2002, SIGMETRICS '02.

[36]  Dharmendra S. Modha,et al.  CAR: Clock with Adaptive Replacement , 2004, FAST.

[37]  Chau-Wen Tseng,et al.  Compiler optimizations for eliminating cache conflict misses , 1997 .

[38]  Gerhard Weikum,et al.  The LRU-K page replacement algorithm for database disk buffering , 1993, SIGMOD Conference.

[39]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[40]  Christoforos E. Kozyrakis,et al.  The ZCache: Decoupling Ways and Associativity , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[41]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[42]  Maurice Herlihy,et al.  Virtualizing transactional memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[43]  Edward S. Davidson,et al.  Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[44]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[45]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[46]  James E. Smith,et al.  Virtual private caches , 2007, ISCA '07.

[47]  H BloomBurton Space/time trade-offs in hash coding with allowable errors , 1970 .

[48]  Basilio B. Fraguela,et al.  Reducing capacity and conflict misses using Set Saturation Levels , 2010, 2010 International Conference on High Performance Computing.

[49]  Nimrod Megiddo,et al.  ARC: A Self-Tuning, Low Overhead Replacement Cache , 2003, FAST.

[50]  Shih-Lien Lu,et al.  Bloom filtering cache misses for accurate data speculation and prefetching , 2014, ICS 25th Anniversary.

[51]  M. V. Ramakrishna,et al.  Efficient Hardware Hashing Functions for High Performance Computers , 1997, IEEE Trans. Computers.

[52]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[53]  Hong Jiang,et al.  STEM: Spatiotemporal Management of Capacity for Intra-core Last Level Caches , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[54]  Stefanos Kaxiras,et al.  Cache replacement based on reuse-distance prediction , 2007, 2007 25th International Conference on Computer Design.