Parallelism-Aware Batch Scheduling: Paving the Way to High-Performance and Fair Memory Controllers

modern processors, the DRAM system is shared among concurrently-executing threads. Memory requests from a thread can delay requests from other threads by causing bank/bus/row- buffer conflicts. Conventional DRAM controllers are unaware of inter-thread interference, which causes two problems. First, some threads are unfairly penalized and denied DRAM service for long time periods. Second, as we show in our ISCA-35 pa- per, each thread's memory-level parallelism can be destroyed. A thread's outstanding requests that would have been serviced in parallel can effectively become serialized, exposing the latency of each request. As a result, both single-thread performance and system performance/fairness degrade. Our ISCA-35 paper proposes parallelism-aware batch schedul- ing (PAR-BS), a new approach to designing a shared DRAM controller. PAR-BS is based on two new basic building blocks which collectively reduce inter-thread interference in DRAM, en- sure fairness, and preserve the memory-level parallelism of each thread. As a result, PAR-BS reduces the memory-related stall- time experienced by the threads. In addition, PAR-BS provides fairness, avoids starvation of any thread, and seamlessly incorpo- rates support for system-level thread priorities. Our evaluations show that PAR-BS significantly improves both fairness and sys- tem performance compared to four previous DRAM controllers across a wide variety of workloads and systems. 1 Summary

[1]  Haitham Akkary,et al.  Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors , 2003, MICRO.

[2]  Santosh G. Abraham,et al.  Store memory-level parallelism optimizations for commercial applications , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[3]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[4]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[5]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[6]  Scott Rixner,et al.  Memory Controller Optimizations for Web Servers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[7]  Trevor Mudge,et al.  Improving data cache performance by pre-executing instructions under a cache miss , 1997 .

[8]  Onur Mutlu,et al.  Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[9]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[10]  Josep Torrellas,et al.  Scalable Cache Miss Handling for High Memory-Level Parallelism , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[11]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[13]  Andrew F. Glew MLP yes! ILP no , 1998, ASPLOS 1998.

[14]  Marc Tremblay,et al.  High-performance throughput computing , 2005, IEEE Micro.

[15]  Huiyang Zhou,et al.  Enhancing memory-level parallelism via recovery-free value prediction , 2005, IEEE Transactions on Computers.

[16]  Stijn Eyerman,et al.  A Memory-Level Parallelism Aware Fetch Policy for SMT Processors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[17]  Sarita V. Adve,et al.  Code transformations to improve memory parallelism , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[18]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[19]  MutluOnur,et al.  Parallelism-Aware Batch Scheduling , 2008 .

[20]  Onur Mutlu,et al.  A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[21]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[22]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[23]  Brian Fahs,et al.  Microarchitecture optimizations for exploiting memory-level parallelism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..