Optimal Worst Case Formulas Comparing Cache Memory Associativity

In this paper we derive a worst case formula comparing the number of cache hits for two different cache memories. From this various other bounds for cache memory performance may be derived. Consider an arbitrary program P which is to be executed on a computer with two alternative cache memories. The first cache is set-associative or direct-mapped. It has k sets and u blocks in each set; this is called a (k,u)-cache. The other is a fully associative cache with q blocks---a (1,q)-cache. We derive an explicit formula for the ratio of the number of cache hits h(P,k,u) for a (k,u)-cache compared to a (1,q)-cache for a worst case program P. We assume that the mappings of the program variables to the cache blocks are optimal. The formula quantifies the ratio $$ \inf_P \frac{h(P,k,u)}{h(P,1,q)}, $$ where the infimum is taken over all programs P with n variables. The formula is a function of the parameters n,k,u, and q only. Note that the quantity $h(P,k,u)$ is NP-hard. We assume the commonly used LRU (least recently used) replacement policy, that each variable can be stored in one memory block, and that each variable is free to be mapped to any set. Since the bound is decreasing in the parameter n, it is an optimal bound for all programs with at most n variables. The formula for cache hits allows us to derive optimal bounds comparing the access times for cache memories. The formula also gives bounds (these are not optimal, however) for any other replacement policy, for direct-mapped versus set-associative caches, and for programs with variables larger than the cache memory blocks.

[1]  Alan Jay Smith,et al.  A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory , 1978, IEEE Transactions on Software Engineering.

[2]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[3]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[4]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[5]  Lars Lundberg,et al.  An optimal upper bound on the minimal completion time in distributed supercomputing , 1994, ICS '94.

[6]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[7]  Lars Lundberg,et al.  Bounding the Gain of Changing the Number of Memory Modules in Shared Memory Multiprocessors , 1997, Nord. J. Comput..

[8]  Lars Lundberg,et al.  Optimal Combinatorial Functions Comparing Multiprocess Allocation Performance in Multiprocessor Systems , 2000, SIAM J. Comput..

[9]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[10]  Lars Lundberg,et al.  An Optimal Execution Time Estimate of Static Versus Dynamic Allocation in Multiprocessor Systems , 1995, SIAM J. Comput..

[11]  Mark D. Hill,et al.  A case for direct-mapped caches , 1988, Computer.

[12]  Lars Lundberg,et al.  Using Recorded Values for Bounding the Minimum Completion Time in Multiprocessors , 1998, IEEE Trans. Parallel Distributed Syst..

[13]  Lars Lundberg,et al.  Optimal Scheduling Results for Parallel Computing , 1996, Applications on Advanced Architecture Computers.

[14]  Lars Lundberg,et al.  Bounding on the gain of optimizing data layout in vector processors , 1998, ICS '98.

[15]  Per Stenström,et al.  On reconfigurable on-chip data caches , 1991, MICRO 24.

[16]  Lars Lundberg,et al.  An optimal lower bound on the maximum speedup in multiprocessors with clusters , 1995, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing.

[17]  Mark Horowitz,et al.  Performance tradeoffs in cache design , 1988, ISCA '88.

[18]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.