论文信息 - Avoiding conflict misses dynamically in large direct-mapped caches

Avoiding conflict misses dynamically in large direct-mapped caches

This paper describes a method for improving the performance of a large direct-mapped cache by reducing the number of conflict misses. Our solution consists of two components: an inexpensive hardware device called a Cache Miss Lookaside (CML) buffer that detects conflicts by recording and summarizing a history of cache misses, and a software policy within the operating system's virtual memory system that removes conflicts by dynamically remapping pages whenever large numbers of conflict misses are detected. Using trace-driven simulation of applications and the operating system, we show that a CML buffer enables a large direct-mapped cache to perform nearly as well as a two-way set associative cache of equivalent size and speed, although with lower hardware cost and complexity.

[1] David A. Wood,et al. An in-cache address translation mechanism , 1986, ISCA '86.

[2] Scott McFarling,et al. Program optimization for instruction caches , 1989, ASPLOS III.

[3] George Eckel. Inside Windows NT , 1993 .

[4] Alan Jay Smith,et al. Aspects of cache memory and instruction buffer performance , 1987 .

[5] Anant Agarwal,et al. Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[6] Ken Chan,et al. PA7200: a PA-RISC processor with integrated high performance MP bus interface , 1994, Proceedings of COMPCON '94.

[7] W. W. Hwu,et al. Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.

[8] Mark Horowitz,et al. Performance tradeoffs in cache design , 1988, ISCA '88.

[9] Brian N. Bershad,et al. The impact of operating system structure on memory system performance , 1994, SOSP '93.

[10] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[11] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[12] Richard E. Kessler,et al. Page placement algorithms for large real-indexed caches , 1992, TOCS.

[13] Alan Jay Smith,et al. Cache Memories , 1982, CSUR.

[14] J. Bradley Chen,et al. Software methods for system address tracing , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[15] Michael J. Flynn,et al. Page allocation to reduce access time of physical caches , 1990 .

[16] Brian N. Bershad,et al. Consistency management for virtually indexed caches , 1992, ASPLOS V.

[17] David W. Wall,et al. Systems for Late Code Modification , 1991, Code Generation.

[18] Robert Wahbe,et al. Efficient software-based fault isolation , 1994, SOSP '93.

[19] T. Anderson,et al. Eecient Software-based Fault Isolation , 1993 .

[20] Peter Davies,et al. The TLB slice—a low-cost high-speed address translation mechanism , 1990, ISCA '90.

[21] Karl Pettis,et al. Profile guided code positioning , 1990, PLDI '90.

[22] Antony L. Hosking,et al. Protection traps and alternatives for memory management of an object-oriented language , 1994, SOSP '93.

[23] Peter Davies,et al. The TLB slice-a low-cost high-speed address translation mechanism , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[24] Mark D. Hill,et al. A case for direct-mapped caches , 1988, Computer.

[25] R. L. Stewart,et al. The Design of the DEC 3000 AXP Systems, Two High-performance Workstations , 1992, Digit. Tech. J..

[26] Mark D. Hill,et al. Aspects of Cache Memory and Instruction , 1987 .