Parallelism-Aware Batch Scheduling: Enabling High-Performance and Fair Shared Memory Controllers

Uncontrolled interthread interference in main memory can destroy individual threads' memory-level parallelism, effectively serializing the memory requests of a thread whose latencies would otherwise have largely overlapped, thereby reducing single-thread performance. The parallelism-aware batch scheduler preserves each thread's memory-level parallelism, ensures fairness and starvation freedom, and supports system-level thread priorities.

[1]  Onur Mutlu,et al.  Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance , 2006, IEEE Micro.

[2]  Onur Mutlu,et al.  Enhancing the Performance and Fairness of Shared DRAM Systems with Parallelism-Aware Batch Scheduling , 2008 .

[3]  Howard Frank,et al.  Analysis and Optimization of Disk Storage Devices for Time-Sharing Systems , 1969, JACM.

[4]  Onur Mutlu,et al.  Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[5]  Trevor Mudge,et al.  Improving data cache performance by pre-executing instructions under a cache miss , 1997 .

[6]  Scott Rixner,et al.  Memory Controller Optimizations for Web Servers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[7]  Stijn Eyerman,et al.  System-level Performance Metrics for Multiprogram Workloads Assessing the Performance of Multiprogram Workloads Running on Multithreaded Hardware Is Difficult Because It Involves a Balance between Single-program Performance and Overall System Performance. This Article Argues for Developing Multiprog , 2008 .

[8]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[9]  Wayne E. Smith Various optimizers for single‐stage production , 1956 .

[10]  Andrew F. Glew MLP yes! ILP no , 1998, ASPLOS 1998.

[11]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[12]  Andrew R. Pleszkun,et al.  Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.