A Burst Scheduling Access Reordering Mechanism

Utilizing the nonuniform latencies of SDRAM devices, access reordering mechanisms alter the sequence of main memory access streams to reduce the observed access latency. Using a revised M5 simulator with an accurate SDRAM module, the burst scheduling access reordering mechanism is proposed and compared to conventional in order memory scheduling as well as existing academic and industrial access reordering mechanisms. With burst scheduling, memory accesses to the same rows of the same banks are clustered into bursts to maximize bus utilization of the SDRAM device. Subject to a static threshold, memory reads are allowed to preempt ongoing writes for reduced read latency, while qualified writes are piggybacked at the end of bursts to exploit row locality in writes and prevent write queue saturation. Performance improvements contributed by read preemption and write piggybacking are identified. Simulation results show that burst scheduling reduces the average execution time of selected SPEC CPU2000 benchmarks by 21% over conventional bank in order memory scheduling. Burst scheduling also outperforms Intel's patented out of order memory scheduling and the row hit access reordering mechanism by 11% and 6% respectively

[1]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[2]  Kevin Skadron,et al.  Design issues and tradeoffs for write buffers , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[3]  Sally A. McKee,et al.  Dynamic Access Ordering for Streamed Computations , 2000, IEEE Trans. Computers.

[4]  Zhao Zhang,et al.  A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.

[5]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[6]  Wei-Fen Lin,et al.  Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[7]  Trevor Mudge,et al.  Modern dram architectures , 2001 .

[8]  Trevor N. Mudge,et al.  High-Performance DRAMs in Workstation Environments , 2001, IEEE Trans. Computers.

[9]  Zhao Zhang,et al.  Fine-grain priority scheduling on multi-channel memory systems , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[10]  Tomas Rokicki,et al.  Indexing Memory Banks to Maximize Page Mode Hit Percentage and Minimize Memory Latency , 2003 .

[11]  Nathan L. Binkert,et al.  Network-Oriented Full-System Simulation using M5 , 2003 .

[12]  T. N. Vijaykumar,et al.  Efficient use of memory bandwidth to improve network processor throughput , 2003, ISCA '03.

[13]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[14]  Scott Rixner,et al.  Memory Controller Optimizations for Web Servers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[15]  Adrian Wong Breaking Through the BIOS Barrier: The Definitive BIOS Optimization Guide for PCs , 2004 .

[16]  Jun Shao,et al.  The bit-reversal SDRAM address mapping , 2005, SCOPES '05.

[17]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers for Modern Processors , 2006, IEEE Micro.

[18]  Jun Shao,et al.  Reducing main memory access latency through SDRAM address mapping techniques and access reordering mechanisms , 2006 .