Reducing memory latency via non-blocking and prefetching caches
暂无分享,去创建一个
[1] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.
[2] Anoop Gupta,et al. Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.
[3] Michael J. Flynn,et al. Writes caches as an alternative to write buffers , 1991 .
[4] Sanjay M. Krishnamurthy,et al. A brief survey of papers on scheduling for pipelined processors , 1990, SIGP.
[5] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[6] Wen-Hann Wang,et al. Multilevel Cache Hierarchies: Organizations, Protocols, and Performance , 1989, J. Parallel Distributed Comput..
[7] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.
[8] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[9] H GornishEdward,et al. Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990 .
[10] Alan Jay Smith,et al. Cache Memories , 1982, CSUR.
[11] Suneel Jain,et al. Circular scheduling: a new technique to perform software pipelining , 1991, PLDI '91.
[12] Alexander V. Veidenbaum,et al. Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990 .
[13] Henry M. Levy,et al. An Architecture for Software-Controlled Data Prefetching , 1991, ISCA.
[14] John L. Hennessy,et al. The priority-based coloring approach to register allocation , 1990, TOPL.
[15] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[16] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[17] FranklinManoj,et al. High-bandwidth data memory systems for superscalar processors , 1991 .
[18] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[19] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[20] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[21] Anoop Gupta,et al. Hiding memory latency using dynamic scheduling in shared-memory multiprocessors , 1992, ISCA '92.
[22] A. Gupta,et al. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.