Page placement algorithms for large real-indexed caches

When a computer system supports both paged virtual memory and large real-indexed caches, cache performance depends in part on the main memory page placement. To date, most operating systems place pages by selecting an arbitrary page frame from a pool of page frames that have been made available by the page replacement algorithm. We give a simple model that shows that this naive (arbitrary) page placement leads to up to 30% unnecessary cache conflicts. We develop several page placement algorithms, called careful-mapping algorithms, that try to select a page frame (from the pool of available page frames) that is likely to reduce cache contention. Using trace-driven simulation, we find that careful mapping results in 10–20% fewer (dynamic) cache misses than naive mapping (for a direct-mapped real-indexed multimegabyte cache). Thus, our results suggest that careful mapping by the operating system can get about half the cache miss reduction that a cache size (or associativity) doubling can.

[1]  R. Acevedo,et al.  Research report , 1967, Revista odontologica de Puerto Rico.

[2]  Robert O. Winder,et al.  Cache-based Computer Systems , 1973, Computer.

[3]  Domenico Ferrari,et al.  The Improvement of Program Behavior , 1976, Computer.

[4]  BabaogluÖzalp,et al.  Converting a swap-based system to do paging in an architecture lacking page-referenced bits , 1981 .

[5]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[6]  James R. Goodman,et al.  The use of static column ram as a memory hierarchy , 1984, ISCA '84.

[7]  James W. Stamos,et al.  Static grouping of small objects to enhance performance of a paged virtual memory , 1984, TOCS.

[8]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[9]  Harold S. Stone,et al.  Footprints in the cache , 1986, SIGMETRICS '86/PERFORMANCE '86.

[10]  Harold S. Stone,et al.  Footprints in the cache , 1987, TOCS.

[11]  James R. Goodman,et al.  Coherency for multiprocessor virtual address caches , 1987, ASPLOS.

[12]  Anant Agarwal,et al.  Multiprocessor cache analysis using ATUM , 1988, ISCA '88.

[13]  Scott McFarling,et al.  Program optimization for instruction caches , 1989, ASPLOS III.

[14]  W. W. Hwu,et al.  Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.

[15]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[16]  W. H. Wang,et al.  Organization and performance of a two-level virtual-real cache hierarchy , 1989, ISCA '89.

[17]  Mark Horowitz,et al.  An analytical cache model , 1989, TOCS.

[18]  J. Hennessy,et al.  Characteristics of performance-optimal multi-level cache hierarchies , 1989, ISCA '89.

[19]  David W. Wall,et al.  Generation and analysis of very long address traces , 1990, ISCA '90.

[20]  Michael J. Flynn,et al.  Page allocation to reduce access time of physical caches , 1990 .

[21]  Peter Davies,et al.  The TLB slice-a low-cost high-speed address translation mechanism , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[22]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[23]  Richard Eugene Kessler Analysis of multi-megabyte secondary CPU cache memories , 1992 .

[24]  David J. Goodman,et al.  Personal Communications , 1994, Mobile Communications.