Implementation Issues in Modern Cache Memory

As the performance gap between processors and main memory continues to widen, increasingly aggressive implementations of cache memories are needed to bridge the gap. In this paper, we consider some of the issues that are involved in the implementation of highly optimized cache memories and survey the techniques that can be used to help achieve the increasingly stringent design targets and constraints of modern processors. In particular, we consider techniques that enable the cache to be accessed quickly and still achieve a good hit ratio. We also consider issues such as area cost and bandwidth requirements. Trace-driven simulations of a TPC-C-like workload and selected applications from the SPEC95 benchmark suite are used in the paper to compare the performance of some of the techniques.

[1]  Randy H. Katz,et al.  An in-cache address translation mechanism , 1986, ISCA '86.

[2]  Edward S. Davidson,et al.  Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[3]  Rahul Razdan,et al.  The Alpha 21264: a 500 MHz out-of-order execution microprocessor , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[4]  G.S. Sohi,et al.  High-Bandwidth Address Translation for Multiple-Issue Processors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[5]  John S. Liptay,et al.  A high-frequency custom CMOS S/390 microprocessor , 1997, IBM J. Res. Dev..

[6]  Qing Yang,et al.  CAT - caching address tags - a technique for reducing area cost of on-chip caches , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[7]  A. Seznec,et al.  Decoupled sectored caches: conciliating low tag implementation cost and low miss ratio , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[8]  Dirk Grunwald,et al.  Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[9]  John H. Edmondson,et al.  Superscalar instruction execution in the 21164 Alpha microprocessor , 1995, IEEE Micro.

[10]  J. ContiC.,et al.  Structural aspects of the system/360 model 85 , 1968 .

[11]  Norman P. Jouppi,et al.  Tradeoffs in two-level on-chip caching , 1994, ISCA '94.

[12]  Andreas Nowatzyk,et al.  Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, ISCA.

[13]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[14]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[15]  Ruby B. Lee Precision architecture , 1989, Computer.

[16]  Wen-mei W. Hwu,et al.  Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[17]  Peter Davies,et al.  The TLB slice-a low-cost high-speed address translation mechanism , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[18]  Sanjeev Kumar,et al.  Exploiting spatial locality in data caches using spatial footprints , 1998, ISCA.

[19]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[20]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[21]  A. Argawal,et al.  Cache performance of operating systems and multiprogramming , 1988 .

[22]  Scott McFarling Cache replacement with dynamic exclusion , 1992, ISCA '92.

[23]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[24]  S. G. Tucker,et al.  The IBM 3090 System: An Overview , 1986, IBM Syst. J..

[25]  Alan Jay Smith,et al.  Interference in multiprocessor computer systems with interleaved memory , 1976, CACM.

[26]  Jih-Kwon Peir,et al.  Fast cache access with full-map block directory , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[27]  Kenneth M. Wilson,et al.  Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[28]  Lishing Liu Cache designs with partial address matching , 1994, MICRO 27.

[29]  Jih-Kwon Peir,et al.  Capturing dynamic memory reference behavior with adaptive cache topology , 1998, ASPLOS VIII.

[30]  W. H. Wang,et al.  Organization and performance of a two-level virtual-real cache hierarchy , 1989, ISCA '89.

[31]  J. Hennessy,et al.  Characteristics of performance-optimal multi-level cache hierarchies , 1989, ISCA '89.

[32]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[33]  Mateo Valero,et al.  Eliminating cache conflict misses through XOR-based placement functions , 1997, ICS '97.

[34]  Thomas Thomas,et al.  The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[35]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[36]  Honesty C. Young,et al.  Improving cache performance with balanced tag and data paths , 1996, ASPLOS VII.

[37]  Ken Chan,et al.  PA7200: a PA-RISC processor with integrated high performance MP bus interface , 1994, Proceedings of COMPCON '94.

[38]  L. Liu,et al.  Early resolution of address translation in cache design , 1990, Proceedings., 1990 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[39]  S. LiptayJ. Structural aspects of the system/360 model 85 , 1968 .

[40]  Mateo Valero,et al.  A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality , 1995, International Conference on Supercomputing.

[41]  R. H. Katz,et al.  Supporting reference and dirty bits in SPUR's virtual address cache , 1989, ISCA '89.

[42]  Brian N. Bershad,et al.  Avoiding conflict misses dynamically in large direct-mapped caches , 1994, ASPLOS VI.

[43]  Brian N. Bershad,et al.  Consistency management for virtually indexed caches , 1992, ASPLOS V.

[44]  Wen-Hann Wang,et al.  Organization And Performance Of A Two-level Virtual-real Cache Hierarchy , 1989, The 16th Annual International Symposium on Computer Architecture.

[45]  Steven Przybylski The performance impact of block sizes and fetch strategies , 1990, ISCA '90.

[46]  Randy H. Katz,et al.  Eliminating the address translation bottleneck for physical address cache , 1992, ASPLOS V.

[47]  S. Seznec,et al.  Don't Use the Page Number, but a Pointer to It , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[48]  Jih-Kwon Peir,et al.  LRU-based column-associative caches , 1998, CARN.

[49]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[50]  Mark D. Hill,et al.  A case for direct-mapped caches , 1988, Computer.

[51]  Alan Jay Smith,et al.  Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.

[52]  N. P. Jouppi Architectural and organizational tradeoffs in the design of the MultiTitan CPU , 1989, ISCA '89.