论文信息 - An architecture for software-controlled data prefetching

An architecture for software-controlled data prefetching

for Software-Controlled Alexander C. Klaiber, Henry M. Levy University of Washington Seattle, WA 98195 Data Prefet thing* not increased as fast as processor speeds, and the tenThis paper describes an architecture and related compiler support for software-controlled data prefetching, a technique to hide memory latency in high-performance processors. At compile-time, FETCH instructions are inserted into the instruction-stream by the compiler, based on anticipated data references and detailed information about the memory system. At run time, a separate functional unit in the CPU, the fetch unit, interprets these instructions and initiates appropriate memory reads, Prefetched data is kept in a small, fullyassociative cache, called the fetchbufler, to reduce contention with the conventional direct-mapped cache. We also introduce a prewriteback technique that can reduce the impact of stalls due to replacement writebacks in the cache. A detailed hardware model is presented and the required compiler support is developed. Simulations based on a MIPS processor model show that this technique can dramatically reduce on-chip cache miss ratios and average observed memory latency for scientific loops at only slight cost in total memory traffic.

Henry M. Levy | Alexander C. Klaiber

[1] Gerry Kane,et al. MIPS R2000 RISC architecture , 1987 .

[2] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[3] Alan Jay Smith,et al. Aspects of cache memory and instruction buffer performance , 1987 .

[4] Eric E. Johnson. Working set prefetching for cache memories , 1989, CARN.

[5] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[6] GannonDennis,et al. Strategies for cache and local memory management by global program transformation , 1988 .

[7] Pen-Chung Yew,et al. : Data Prefetching In Shared Memory Multiprocessors , 1987, ICPP.

[8] B. Ramakrishna Rau,et al. The Cydram 5 Stride-Insensitive Memory System , 1989, ICPP.

[9] Alexander V. Veidenbaum,et al. Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990, ICS '90.

[10] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .

[11] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[12] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..

[13] Susan J. Eggers,et al. Techniques for efficient inline tracing on a shared-memory multiprocessor , 1990, SIGMETRICS '90.

[14] W. Kent Fuchs,et al. TRAPEDS: producing traces for multicomputers via execution driven simulation , 1989, SIGMETRICS '89.