A memory controller for improved performance of streamed computations on symmetric multiprocessors
暂无分享,去创建一个
[1] James R. Goodman,et al. The declining effectiveness of dynamic caching for general- purpose microprocessors , 1995 .
[2] B. Ramakrishna Rau,et al. Pseudo-randomly interleaved memory , 1991, ISCA '91.
[3] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[4] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[5] AyguadéEduard,et al. Increasing the number of strides for conflict-free vector access , 1992 .
[6] Eduard Ayguadé,et al. Increasing the number of strides for conflict-free vector access , 1992, ISCA '92.
[7] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[8] Sally A. McKee,et al. Experimental implementation of dynamic access ordering , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.
[9] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[10] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[11] Sally A. McKee,et al. Increasing Memory Bandwidth for Vector Computations , 1994, Programming Languages and System Architectures.
[12] Steven A. Moyer,et al. Access Ordering and Effective Memory Bandwidth , 1993 .
[13] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.
[14] Manuel E. Benitez,et al. Code generation for streaming: an access/execute mechanism , 1991, ASPLOS IV.
[15] Tzi-cker Chiueh,et al. Sunder: a programmable hardware prefetch architecture for numerical loops , 1994, Proceedings of Supercomputing '94.
[16] Zhiyuan Li,et al. An Empirical Study of the Workload Distribution under Static Scheduling , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.
[17] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[18] David T. Harper,et al. Increased Memory Performance During Vector Accesses Through the use of Linear Address Transformations , 1992, IEEE Trans. Computers.
[19] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[20] Q. S. Gao. The Chinese Remainder Theorem And The Prime Memory System , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[21] Ken Kennedy,et al. Blocking Linear Algebra Codes for Memory Hierarchies , 1989, PPSC.