论文信息 - Memory access scheduling

Memory access scheduling

The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with the "3-D" structure of banks, rows, and columns characteristic of contemporary DRAM chips. There is nearly an order of magnitude difference in bandwidth between successive references to different columns within a row and different rows within a bank. This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure. Conservative reordering, in which the first ready reference in a sequence is performed, improves bandwidth by 40% for traces from five media benchmarks. Aggressive reordering, in which operations are scheduled to optimize memory bandwidth, improves bandwidth by 93% for the same set of applications. Memory access scheduling is particularly important for media processors where it enables the processor to make the most efficient use of scarce memory bandwidth.

[1] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[2] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[3] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[4] Takeo Kanade,et al. Development of a video-rate stereo machine , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[5] Sally A. McKee,et al. Access ordering and memory-conscious cache utilization , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[6] Michael D. Smith,et al. Geust Editorial: Media processing: a new design target , 1996, IEEE Micro.

[7] Fong Pong,et al. Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[8] Richard Crisp,et al. Direct RAMbus technology: the new main memory standard , 1997, IEEE Micro.

[9] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.

[10] William J. Dally,et al. A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[11] Lockup-free instruction fetch/prefetch cache organization , 1981, ISCA '98.

[12] J. Emer,et al. A characterization of processor performance in the VAX-11/780 , 1998, ISCA '98.

[13] Mateo Valero,et al. Command vector memory systems: high performance at low cost , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[14] S. Miura,et al. Access optimizer to overcome the "future walls of embedded DRAMs" in the era of systems on silicon , 1999, 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278).

[15] Norman P. Jouppi,et al. Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.

[16] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[17] Trevor N. Mudge,et al. A performance comparison of contemporary DRAM architectures , 1999, ISCA.

[18] Sally A. McKee,et al. Access order and effective bandwidth for streams on a Direct Rambus memory , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[19] Sally A. McKee,et al. Design of a parallel vector access unit for SDRAM memory systems , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).