Modern high-performance processors utilize multi-level cache structures to help tolerate the increasing latency of main memory. Most of these caches employ either a writeback or a write-through strategy to deal with store operations. Write-through caches propagate data to more distant memory levels at the time each store occurs, which requires a very large bandwidth between the memory hierarchy levels. Writeback caches can significantly reduce the bandwidth requirements between caches and memory by marking cache lines as dirty when stores are processed and writing those lines to the memory system only when that dirty line is evicted. Unfortunately, for applications that experience significant numbers of cache misses due to streaming data, writeback cache designs can degrade overall system performance by clustering bus activity when dirty lines contend with data being fetched into the cache. In this paper we present a new technique called Eager Writeback, which re-distributes and balances memory traffic by writing and "cleaning" dirty cache lines prior to their eviction. Eager Writeback can be viewed as a compromise between write-through and writeback policies, in which dirty lines are written later than write-through, but prior to writeback. We will show that this approach can reduce the large number of writes seen in a write-through design, while avoiding the performance degradation caused by clustering bus traffic in a writeback approach.
[1]
David A. Patterson,et al.
Computer Architecture: A Quantitative Approach
,
1969
.
[2]
Gurindar S. Sohi,et al.
Instruction issue logic for high-performance, interruptable pipelined processors
,
1987,
ISCA '87.
[3]
Todd M. Austin,et al.
The SimpleScalar tool set, version 2.0
,
1997,
CARN.
[4]
Kevin Skadron,et al.
Design issues and tradeoffs for write buffers
,
1997,
Proceedings Third International Symposium on High-Performance Computer Architecture.
[5]
Lockup-free instruction fetch/prefetch cache organization
,
1981,
ISCA '98.
[6]
Hsien-Hsin Lee,et al.
Architecture of a 3 D Software Stack for Peak Pentium ® III Processor Performance
,
1999
.
[7]
B. Falsafi,et al.
Selective, accurate, and timely self-invalidation using last-touch prediction
,
2000,
Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).