Memory scheduling for modern microprocessors

The need to carefully schedule memory operations has increased as memory performance has become increasingly important to overall system performance. This article describes the adaptive history-based (AHB) scheduler, which uses the history of recently scheduled operations to provide three conceptual benefits: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, (2) it provides a mechanism for combining multiple constraints, which is important for increasingly complex DRAM structures, and (3) it allows the scheduler to select operations so that they match the program's mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller. We have previously evaluated this scheduler in the context of the IBM Power5. When compared with the state of the art, this scheduler improves performance by 15.6%, 9.9%, and 7.6% for the Stream, NAS, and commercial benchmarks, respectively. This article expands our understanding of the AHB scheduler in a variety of ways. Looking backwards, we describe the scheduler in the context of prior work that focused exclusively on avoiding bank conflicts, and we show that the AHB scheduler is superior for the IBM Power5, which we argue will be representative of future microprocessor memory controllers. Looking forwards, we evaluate this scheduler in the context of future systems by varying a number of microarchitectural features and hardware parameters. For example, we show that the benefit of this scheduler increases as we move to multithreaded environments.

[1]  Q. S. Gao The Chinese remainder theorem and the prime memory system , 1993, ISCA '93.

[2]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[3]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[4]  Steven A. Moyer,et al.  Access Ordering and Effective Memory Bandwidth , 1993 .

[5]  Balaram Sinharoy,et al.  Design and implementation of the POWER5 microprocessor , 2004, Proceedings. 41st Design Automation Conference, 2004..

[6]  Zarka Cvetanovic Performance analysis of the Alpha 21364-based HP GS1280 multiprocessor , 2003, ISCA '03.

[7]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[8]  Eduard Ayguadé,et al.  Increasing the number of strides for conflict-free vector access , 1992, ISCA '92.

[9]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[10]  Sally A. McKee,et al.  Dynamic Access Ordering for Streamed Computations , 2000, IEEE Trans. Computers.

[11]  David T. Harper,et al.  Performance Evaluation of Vector Accesses in Parallel Memories Using a Skewed Storage Scheme , 1986, ISCA.

[12]  Sally A. McKee,et al.  Smarter Memory: Improving Bandwidth for Streamed References , 1998, Computer.

[13]  Sally A. McKee,et al.  Hardware Support for Dynamic Access Ordering: Performance of Some Design Options , 1993 .

[14]  Steven L. Scott,et al.  Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.

[15]  Erik Brunvand,et al.  Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[16]  Calvin Lin,et al.  Enhancing memory controllers to improve dram power and performance , 2006 .

[17]  Sally A. McKee,et al.  Maximizing memory bandwidth for streamed computations , 1996 .

[18]  Sally A. McKee,et al.  Design of a parallel vector access unit for SDRAM memory systems , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[19]  Eduard Ayguadé,et al.  Vector multiprocessors with arbitrated memory access , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[20]  Scott Rixner,et al.  Memory Controller Optimizations for Web Servers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[21]  Harvey G. Cragon,et al.  Memory systems and pipelined processors , 1996 .

[22]  B. Ramakrishna Rau,et al.  Pseudo-randomly interleaved memory , 1991, ISCA '91.

[23]  Sally A. McKee,et al.  Algorithmic foundations for a parallel vector access memory system , 2000, SPAA '00.

[24]  John P. Hayes,et al.  On randomly interleaved memories , 1990, Proceedings SUPERCOMPUTING '90.

[25]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[26]  Hong Z. Tan,et al.  Toward realistic haptic rendering of surface textures , 2004 .

[27]  HurIbrahim,et al.  Memory scheduling for modern microprocessors , 2007 .

[28]  Ozalp Babaoglu,et al.  ACM Transactions on Computer Systems , 2007 .

[29]  Binu K. Mathew,et al.  PARALLEL VECTOR ACCESS: A TECHNIQUE FOR IMPROVING MEMORY SYSTEM PERFORMANCE , 2000 .

[30]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers for Modern Processors , 2006, IEEE Micro.

[31]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[32]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).