Memory controller design under cloud workloads

This work studies the behavior of state-of-the-art memory controller designs when executing scale-out workloads. It considers memory scheduling techniques, memory page management policies, the number of memory channels, and the address mapping scheme used. Experimental measurements demonstrate: 1) Several recently proposed memory scheduling policies are not a good match for these scale-out workloads. 2) The relatively simple First-Ready-First-Come- First-Served (FR-FCFS) policy performs consistently better, and 3) for most of the studied workloads, the even simpler First-Come-First-Served scheduling policy is within 1% of FRFCFS. 4) Increasing the number of memory channels offers negligible performance benefits, e.g., performance improves by 1.7% on average for 4-channels vs. 1-channel. 5) 77%- 90% of DRAM rows activations are accessed only once before closure. These observation can guide future development and optimization of memory controllers for scale-out workloads.

[1]  Tao Li,et al.  Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[2]  Ying Xu,et al.  Prediction in Dynamic SDRAM Controller Policies , 2009, SAMOS.

[3]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[4]  In-Cheol Park,et al.  History-based memory mode prediction for improving memory performance , 2002, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[5]  Babak Falsafi,et al.  Scale-out processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[6]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Natalie D. Enright Jerger,et al.  Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.

[9]  Mor Harchol-Balter,et al.  ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[10]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[11]  Scott Rixner,et al.  Memory Controller Optimizations for Web Servers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[12]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[13]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[14]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[15]  R. Ramesh,et al.  Phase Aware Memory Scheduling , 2013 .

[16]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[17]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[18]  MutluOnur,et al.  Self-Optimizing Memory Controllers , 2008 .

[19]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[20]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[21]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[22]  Krisztián Flautner,et al.  PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor , 2006, ASPLOS XII.

[23]  David W. Nellans,et al.  Prediction Based DRAM Row-Buffer Management in the Many-Core Era , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[24]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[25]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[26]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[27]  Faye A. Briggs,et al.  A study of performance impact of memory controller features in multi-processor server environment , 2004, WMPI '04.

[28]  Steven A. Przybylski,et al.  Cache and memory hierarchy design: a performance-directed approach , 1990 .

[29]  Zhimin Zhang,et al.  RBPP: A row based DRAM page policy for the many-core era , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).

[30]  Xi Zhang,et al.  Security Memory System for Mobile Device or Computer Against Memory Attacks , 2013, ISCTCS.

[31]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[32]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.