Exploring Allocation Policies in Disaggregated Non-Volatile Memories

Many modern applications have memory footprints that are increasingly large, driving system memory capacities higher and higher. However, due to the diversity of applications that run on High-Performance Computing (HPC) systems, the memory utilization can fluctuate widely from one application to another, which results in underutilization issues when there are many jobs with small memory footprints. Since memory chips are collocated with the compute nodes, this necessitates the need for message passing APIs to be able to share information between nodes. To address some of these issues, vendors are exploring disaggregated memory-centric systems. In this type of organization, there are discrete nodes, reserved solely for memory, which are shared across many compute nodes. Due to their capacity, low-power, and non-volatility, Non-Volatile Memories (NVMs) are ideal candidates for these memory nodes. Moreover, larger memory capacities open the door to different programming models (more shared memory style approaches) which are now being added to the C++ and Fortran language specifications. This paper proposes a simulation model for studying disaggregated memory architectures using a publicly available simulator, SST Simulator, and investigates various memory allocation policies.

[1]  Mario Nemirovsky,et al.  Disaggregated Computing. An Evaluation of Current Trends for Datacentres , 2017, ICCS.

[2]  Simon David Hammond,et al.  Opal: A Centralized Memory Manager for Investigating Disaggregated Memory Systems. , 2018 .

[3]  Krste Asanovic,et al.  FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers , 2014 .

[4]  Andrew Siegel,et al.  XSBENCH - THE DEVELOPMENT AND VERIFICATION OF A PERFORMANCE ABSTRACTION FOR MONTE CARLO REACTOR ANALYSIS , 2014 .

[5]  Amro Awad,et al.  Samba: A Detailed Memory Management Unit (MMU) for the SST Simulation Framework , 2016 .

[6]  Nadav Amit,et al.  Optimizing the TLB Shootdown Algorithm with Page Access Tracking , 2017, USENIX Annual Technical Conference.

[7]  Kang G. Shin,et al.  Efficient Memory Disaggregation with Infiniswap , 2017, NSDI.

[8]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[9]  Trevor N. Mudge,et al.  Trace-driven memory simulation: a survey , 1997, CSUR.

[10]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[11]  Ian Karlin,et al.  LULESH 2.0 Updates and Changes , 2013 .

[12]  Yan Solihin,et al.  Avoiding TLB Shootdowns Through Self-Invalidating TLB Entries , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[13]  J. C. Browne,et al.  Trace driven modeling: Review and overview , 1973, ANSS '73.

[14]  Purushottam Kulkarni,et al.  DiME: A Performance Emulator for Disaggregated Memory Architectures , 2017, APSys.

[15]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[16]  Geoffrey Alexander Gunow,et al.  SimpleMOC - A performance abstraction for 3D MOC , 2015 .

[17]  Patricia J. Teller Translation-lookaside buffer consistency , 1990, Computer.

[18]  Charles R. Ferenbaugh,et al.  PENNANT: an unstructured mesh mini‐app for advanced architecture research , 2015, Concurr. Comput. Pract. Exp..

[19]  Bruce Jacob,et al.  The structural simulation toolkit , 2006, PERV.

[20]  David Roberts,et al.  Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).