Avoiding conflict misses dynamically in large direct-mapped caches

This paper describes a method for improving the performance of a large direct-mapped cache by reducing the number of conflict misses. Our solution consists of two components: an inexpensive hardware device called a Cache Miss Lookaside (CML) buffer that detects conflicts by recording and summarizing a history of cache misses, and a software policy within the operating system's virtual memory system that removes conflicts by dynamically remapping pages whenever large numbers of conflict misses are detected. Using trace-driven simulation of applications and the operating system, we show that a CML buffer enables a large direct-mapped cache to perform nearly as well as a two-way set associative cache of equivalent size and speed, although with lower hardware cost and complexity.

[1]  David A. Wood,et al.  An in-cache address translation mechanism , 1986, ISCA '86.

[2]  Scott McFarling,et al.  Program optimization for instruction caches , 1989, ASPLOS III.

[3]  George Eckel Inside Windows NT , 1993 .

[4]  Alan Jay Smith,et al.  Aspects of cache memory and instruction buffer performance , 1987 .

[5]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[6]  Ken Chan,et al.  PA7200: a PA-RISC processor with integrated high performance MP bus interface , 1994, Proceedings of COMPCON '94.

[7]  W. W. Hwu,et al.  Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.

[8]  Mark Horowitz,et al.  Performance tradeoffs in cache design , 1988, ISCA '88.

[9]  Brian N. Bershad,et al.  The impact of operating system structure on memory system performance , 1994, SOSP '93.

[10]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[11]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[12]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[13]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[14]  J. Bradley Chen,et al.  Software methods for system address tracing , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[15]  Michael J. Flynn,et al.  Page allocation to reduce access time of physical caches , 1990 .

[16]  Brian N. Bershad,et al.  Consistency management for virtually indexed caches , 1992, ASPLOS V.

[17]  David W. Wall,et al.  Systems for Late Code Modification , 1991, Code Generation.

[18]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[19]  T. Anderson,et al.  Eecient Software-based Fault Isolation , 1993 .

[20]  Peter Davies,et al.  The TLB slice—a low-cost high-speed address translation mechanism , 1990, ISCA '90.

[21]  Karl Pettis,et al.  Profile guided code positioning , 1990, PLDI '90.

[22]  Antony L. Hosking,et al.  Protection traps and alternatives for memory management of an object-oriented language , 1994, SOSP '93.

[23]  Peter Davies,et al.  The TLB slice-a low-cost high-speed address translation mechanism , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[24]  Mark D. Hill,et al.  A case for direct-mapped caches , 1988, Computer.

[25]  R. L. Stewart,et al.  The Design of the DEC 3000 AXP Systems, Two High-performance Workstations , 1992, Digit. Tech. J..

[26]  Mark D. Hill,et al.  Aspects of Cache Memory and Instruction , 1987 .