The declining effectiveness of dynamic caching for general- purpose microprocessors
暂无分享,去创建一个
[1] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[2] Sally A. McKee,et al. Access ordering and memory-conscious cache utilization , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[3] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[4] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[5] Scott A. Mahlke,et al. Dynamic memory disambiguation using the memory conflict buffer , 1994, ASPLOS VI.
[6] Edward S. Davidson,et al. Analysis of memory referencing behavior for design of local memories , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.
[7] James R. Larus,et al. Wisconsin Architectural Research Tool Set , 1993, CARN.
[8] Tzi-cker Chiueh,et al. Sunder: a programmable hardware prefetch architecture for numerical loops , 1994, Proceedings of Supercomputing '94.
[9] Janak H. Patel,et al. Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.
[10] Anne Rogers,et al. Software support for speculative loads , 1992, ASPLOS V.
[11] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[12] James E. Smith,et al. Decoupled access/execute computer architectures , 1984, TOCS.
[13] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[14] Jean-Loup Baer,et al. Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.
[15] W. Jalby,et al. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.
[16] J.W.C. Fu,et al. Stride Directed Prefetching In Scalar Processors , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.
[17] D LamMonica,et al. The cache performance and optimizations of blocked algorithms , 1991 .
[18] John Paul Shen,et al. Speculative disambiguation: a compilation technique for dynamic memory disambiguation , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[19] Scott A. Mahlke,et al. Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.
[20] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[21] Todd M. Austin,et al. Knapsack: a Zero-cycle Memory Hierarchy Component , 1993 .
[22] N. S. Barnett,et al. Private communication , 1969 .
[23] Richard M. Karp,et al. Index Register Allocation , 1966, JACM.
[24] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[25] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[26] James E. Smith,et al. PowerPC 601 and Alpha 21064: a tale of two RISCs , 1994, Computer.
[27] Andrew R. Pleszkun,et al. PIPE: a VLSI decoupled architecture , 1985, ISCA '85.
[28] Alan Jay Smith,et al. Bibliography and reading on CPU cache memories and related topics , 1986, CARN.
[29] Matthew T. O'Keefe,et al. Reducing memory traffic with CRegs , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.
[30] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[31] David Keppel,et al. Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.
[32] James E. Smith,et al. The ZS-1 central processor , 1987, ASPLOS.
[33] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[34] James R. Larus,et al. Tempest and typhoon: user-level shared memory , 1994, ISCA '94.
[35] Santosh G. Abraham,et al. Efficient simulation of caches under optimal replacement with applications to miss characterization , 1993, SIGMETRICS '93.
[36] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[37] H. Levy,et al. An architecture for software-controlled data prefetching , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.
[38] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[39] A. Gupta,et al. The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[40] Peter Yan-Tek Hsu. Designing the TFP microprocessor , 1994, IEEE Micro.
[41] H. T. Kung. Memory requirements for balanced computer architectures , 1986, ISCA '86.
[42] Anoop Gupta,et al. Hiding memory latency using dynamic scheduling in shared-memory multiprocessors , 1992, ISCA '92.