Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

Hardware techniques for improving the performance of caches are presented. Miss caching places a small, fully associative cache between a cache and its refill path. Misses in the cache that hit in the miss cache have only a 1-cycle miss penalty. Small miss caches of 2 to 5 entries are shown to be very effective in removing mapping conflict misses in first-level direct-mapped caches. Victim caching is an improvement to miss caching in that it loads the small fully associative cache with the victim of a miss and not the requested line. Small victim caches of 1 to 5 entries are even more effective at removing conflict misses than miss caching. Stream buffers prefetch cache lines starting at a cache miss address. The prefetched data are placed in the buffer and not in the cache. Stream buffers are useful in removing capacity and compulsory cache misses, as well as some instruction cache conflict misses. An extension to the basic stream buffer, called a multiway stream buffer, is introduced.<<ETX>>