Reuse-based online models for caches

We develop a reuse distance/stack distance based analytical modeling framework for efficient, online prediction of cache performance for a range of cache configurations and replacement policies LRU, PLRU, RANDOM, NMRU. Our framework unifies existing cache miss rate prediction techniques such as Smith's associativity model, Poisson variants, and hardware way-counter based schemes. We also show how to adapt LRU way-counters to work when the number of sets in the cache changes. As an example application, we demonstrate how results from our models can be used to select, based on workload access characteristics, last-level cache configurations that aim to minimize energy-delay product.

[1]  Alan Jay Smith,et al.  A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory , 1978, IEEE Transactions on Software Engineering.

[2]  Frank Vahid,et al.  A One-Shot Configurable-Cache Tuner for Improved Energy and Performance , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[3]  Michael L. Scott,et al.  Integrating adaptive on-chip storage structures for reduced dynamic power , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[4]  Zhen Yang,et al.  Modeling and Stack Simulation of CMP Cache Capacity and Accessibility , 2009, IEEE Transactions on Parallel and Distributed Systems.

[5]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[6]  David A. Wood,et al.  A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.

[7]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[8]  Daniel Sánchez,et al.  Implementing Signatures for Transactional Memory , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[9]  Francisco J. Cazorla,et al.  Adapting cache partitioning algorithms to pseudo-LRU replacement policies , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[10]  Srinivas Devadas,et al.  Dynamic Cache Partitioning for CMP/SMT Systems , 2004 .

[11]  David A. Wood,et al.  IPC Considered Harmful for Multiprocessor Workloads , 2006, IEEE Micro.

[12]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[13]  Babak Falsafi,et al.  Modeling cost/performance of a parallel computer simulator , 1997, TOMC.

[14]  Chen Ding,et al.  Program locality analysis using reuse distance , 2009, TOPL.

[15]  Harold S. Stone,et al.  Footprints in the cache , 1986, SIGMETRICS '86/PERFORMANCE '86.

[16]  Chen Ding,et al.  Reuse Distance Analysis , 2001 .

[17]  Wei Zhang,et al.  Exploiting stack distance to estimate worst-case data cache performance , 2009, SAC '09.

[18]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[19]  Jan Reineke,et al.  Relative competitive analysis of cache replacement policies , 2008, LCTES '08.

[20]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[21]  Thomas Roberts Puzak,et al.  Analysis of cache replacement-algorithms , 1985 .

[22]  M. Hofri,et al.  The coupon-collector problem revisited — a survey of engineering problems and computational methods , 1997 .

[23]  J. Kelly Flanagan,et al.  Facilitating level three cache studies using set sampling , 2000, 2000 Winter Simulation Conference Proceedings (Cat. No.00CH37165).

[24]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[25]  Larry Carter,et al.  Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.

[26]  Kristof Beyls,et al.  Reuse Distance as a Metric for Cache Behavior. , 2001 .

[27]  Eric M. Schwarz,et al.  IBM POWER6 microarchitecture , 2007, IBM J. Res. Dev..

[28]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[29]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[30]  Rudolf Eigenmann,et al.  SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.

[31]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[32]  Mark D. Hill,et al.  Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[33]  Janak H. Patel,et al.  Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.

[34]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[35]  Milo M. K. Martin,et al.  Simulating a $ 2 M Commercial Server on a $ 2 K PC T , 2001 .

[36]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[37]  Varghese George,et al.  Power management of the third generation intel core micro architecture formerly codenamed ivy bridge , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).

[38]  Onur Mutlu,et al.  A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[39]  C. Cascaval,et al.  Calculating stack distances efficiently , 2003, MSP '02.

[40]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[41]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[42]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[43]  Charles A. Micchelli,et al.  Binomial Matrices , 2001, Adv. Comput. Math..

[44]  Mark Horowitz,et al.  An analytical cache model , 1989, TOCS.

[45]  Kimming So,et al.  Cache Operations by MRU Change , 1988, IEEE Trans. Computers.

[46]  Fang Liu,et al.  Characterizing and modeling the behavior of context switch misses! , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[47]  MutluOnur,et al.  A Case for MLP-Aware Cache Replacement , 2006 .