In a chip multiprocessor (CMP) system, where multiple on-chip cores share a common memory interface, simultaneous memory requests from different threads can interfere with each other. Unfortunately, conventional memory scheduling techniques only try to optimize for overall data throughput and do not account for this inter-thread interference. Therefore, different threads running concurrently on the same chip can experience extremely different memory system performance: one thread can experience a severe slowdown or starvation while another is unfairly prioritized by the memory scheduler. Our MICRO-40 paper proposes a new memory access scheduler, called the Stall-Time Fair Memory scheduler (STFM), that provides performance fairness to threads sharing the DRAM memory system. The key idea of the proposed scheduler is to equalize the DRAM-related slowdown experienced by each thread due to interference from other threads, without hurting overall system performance. To do so, STFM estimates at run-time each thread’s slowdown due to sharing the DRAM system and prioritizes memory requests of threads that are slowed down the most. Unlike previous approaches to DRAM scheduling, STFM comprehensively takes into account inherent memory characteristics of each thread and therefore does not unfairly penalize threads that use the DRAM system without interfering with others. We show how STFM can be configured by the system software to control unfairness and to enforce thread priorities. Our results show that STFM significantly reduces the unfairness in the DRAM system while also improving system throughput on a wide variety of workloads and CMP systems. For example, averaged over 32 different workloads running on an 8-core CMP, the ratio between the highest DRAM-related slowdown and the lowest DRAM-related slowdown reduces from 5.26X to 1.4X, while system throughput improves by 7.6%. We qualitatively and quantitatively compare STFM to one new and three previously-proposed memory access scheduling algorithms, including Network Fair Queueing. Our results show that STFM provides the best fairness, system throughput, and scalability.
[1]
Brian Fahs,et al.
Microarchitecture optimizations for exploiting memory-level parallelism
,
2004,
Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[2]
Scott Rixner,et al.
Memory Controller Optimizations for Web Servers
,
2004,
37th International Symposium on Microarchitecture (MICRO-37'04).
[3]
Onur Mutlu,et al.
Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems
,
2007,
USENIX Security Symposium.
[4]
William J. Dally,et al.
Memory access scheduling
,
2000,
Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[5]
K ParekhAbhay,et al.
A generalized processor sharing approach to flow control in integrated services networks
,
1993
.
[6]
James E. Smith,et al.
Fair Queuing Memory Systems
,
2006,
2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).