Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching
暂无分享,去创建一个
[1] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[2] Josep Torrellas,et al. False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.
[3] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[4] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[5] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[6] Janak H. Patel,et al. Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.
[7] Cezary Dubnicki,et al. Adjustable Block Size Coherent Caches , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[8] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[9] Anoop Gupta,et al. Performance evaluation of hybrid hardware and software distributed shared memory protocols , 1994, ICS '94.
[10] Scott A. Mahlke,et al. Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.