Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching
暂无分享,去创建一个
[1] Anoop Gupta,et al. Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS III.
[2] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[3] Henry M. Levy,et al. An Architecture for Software-Controlled Data Prefetching , 1991, ISCA.
[4] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[5] Henry M. Levy,et al. An architecture for software-controlled data prefetching , 1991, ISCA '91.
[6] Thomas J. LeBlanc,et al. Adjustable block size coherent caches , 1992, ISCA '92.
[7] Stephen R. Goldschmidt,et al. Simulation of multiprocessors: accuracy and performance , 1993 .
[8] Scott A. Mahlke,et al. Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.
[9] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[10] Michel Dubois,et al. International Conference on Parallel Processing Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors , 2006 .
[11] Alexander V. Veidenbaum,et al. Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990, ICS '90.
[12] Anoop Gupta,et al. Performance evaluation of hybrid hardware and software distributed shared memory protocols , 1994, ICS '94.
[13] Josep Torrellas,et al. False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.
[14] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[15] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[16] Janak H. Patel,et al. Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.
[17] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[18] John L. Hennessy,et al. The performance advantages of integrating block data transfer in cache-coherent multiprocessors , 1994, ASPLOS VI.
[19] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[20] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[21] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..