Footprint modeling of cache associativity and granularity

Two common techniques in efficient caching are associativity and sub-block granularity. This short paper presents a parameterized and composable model for each of the two techniques. It shows how the new models are more general, accurate or efficient than previous modeling solutions in either technique, and how they can be used together to model the cache implemented with both techniques, i.e. sub-block set associative cache.

[1]  G. Edward Suh,et al.  Analytical cache models with applications to cache partitioning , 2001, ICS '01.

[2]  Zachary Drudi,et al.  A streaming algorithms approach to approximating hit rate curves , 2014 .

[3]  Hao Luo,et al.  HOTL: a higher order theory of locality , 2013, ASPLOS '13.

[4]  Yingwei Luo,et al.  LAMA: Optimized Locality-aware Memory Allocation for Key-value Cache , 2015, USENIX Annual Technical Conference.

[5]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[6]  Yingwei Luo,et al.  Kinetic Modeling of Data Eviction in Cache , 2016, USENIX Annual Technical Conference.

[7]  Yingwei Luo,et al.  Optimal Cache Partition-Sharing , 2015, 2015 44th International Conference on Parallel Processing.

[8]  Vasundhara Puttagunta,et al.  Analysis of Sub-block Placement and Victim Caching Techniques , 2000 .

[9]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[10]  Gabriel Marin mgabi Scalable Cross-Architecture Predictions of Memory Hierarchy Response for Scientific Applications , 2005 .

[11]  C. Cascaval,et al.  Calculating stack distances efficiently , 2003, MSP '02.

[12]  Michael Shantz,et al.  Multi-level texture caching for 3D graphics hardware , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[13]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[14]  David A. Wood,et al.  Reuse-based online models for caches , 2013, SIGMETRICS '13.

[15]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[16]  Alan Jay Smith,et al.  On the effectiveness of set associative page mapping and its application to main memory management , 1976, ICSE '76.

[17]  Dong Li,et al.  PORPLE: An Extensible Optimizer for Portable Data Placement on GPU , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[18]  Chen Ding,et al.  Linear-time Modeling of Program Working Set in Shared Cache , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[19]  David A. Wood,et al.  Implementing stack simulation for highly-associative memories , 1991, SIGMETRICS '91.

[20]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[21]  Henk Corporaal,et al.  A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[22]  Donald Yeung,et al.  Studying multicore processor scaling via reuse distance analysis , 2013, ISCA.

[23]  Ken Kennedy,et al.  A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion , 2005, LCPC.

[24]  Andrew Warfield,et al.  Characterizing Storage Workloads with Counter Stacks , 2014, OSDI.