Page migration support for disaggregated non-volatile memories

As demands for memory-intensive applications continue to grow, the memory capacity of each computing node is expected to grow at a similar pace. In high-performance computing (HPC) systems, the memory capacity per compute node is decided upon the most demanding application that would likely run on such system, and hence the average capacity per node in future HPC systems is expected to grow significantly. However, since HPC systems run many applications with different capacity demands, a large percentage of the overall memory capacity will likely be underutilized; memory modules can be thought of as private memory for its corresponding computing node. Thus, as HPC systems are moving towards the exascale era, a better utilization of memory is strongly desired. Moreover, upgrading memory system requires significant efforts. Fortunately, disaggregated memory systems promise better utilization by defining regions of global memory, typically referred to as memory blades, which can be accessed by all computing nodes in the system, thus achieving much better utilization. Disaggregated memory systems are expected to be built using dense, power-efficient memory technologies. Thus, emerging nonvolatile memories (NVMs) are placing themselves as the main building blocks for such systems. However, NVMs are slower than DRAM. Therefore, it is expected that each computing node would have a small local memory that is based on either HBM or DRAM, whereas a large shared NVM memory would be accessible by all nodes. Managing such system with global and local memory requires a novel hardware/software co-design to initiate page migration between global and local memory to maximize performance while enabling access to huge shared memory. In this paper we provide support to migrate pages, investigate such memory management aspects and the major system-level aspects that can affect design decisions in disaggregated NVM systems

[1]  Dean M. Tullsen,et al.  MemPod: A Clustered Architecture for Efficient and Scalable Migration in Flat Address Space Multi-level Memories , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[2]  Xu Liu,et al.  memif: Towards Programming Heterogeneous Memory Asynchronously , 2016, ASPLOS.

[3]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[4]  Aamer Jaleel,et al.  CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[5]  Qingyuan Deng,et al.  MemScale: active low-power modes for main memory , 2011, ASPLOS XVI.

[6]  Mario Nemirovsky,et al.  Disaggregated Computing. An Evaluation of Current Trends for Datacentres , 2017, ICCS.

[7]  Simon David Hammond,et al.  Opal: A Centralized Memory Manager for Investigating Disaggregated Memory Systems. , 2018 .

[8]  Aamer Jaleel,et al.  BATMAN: techniques for maximizing system bandwidth of memory systems with stacked-DRAM , 2017, MEMSYS.

[9]  Xiaoyuan Wang Supporting Superpages and Lightweight Page Migration in Hybrid Memory Systems , 2019, ACM Trans. Archit. Code Optim..

[10]  David Roberts,et al.  Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[11]  Daniel J. Sorin,et al.  UNified Instruction/Translation/Data (UNITD) coherence: One protocol to rule them all , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[12]  Ricardo Bianchini,et al.  Page placement in hybrid memory systems , 2011, ICS '11.

[13]  Charles R. Ferenbaugh,et al.  PENNANT: an unstructured mesh mini‐app for advanced architecture research , 2015, Concurr. Comput. Pract. Exp..

[14]  Chris Fallin,et al.  Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.

[15]  Geoffrey Alexander Gunow,et al.  SimpleMOC - A performance abstraction for 3D MOC , 2015 .

[16]  Rachata Ausavarungnirun,et al.  Row buffer locality aware caching policies for hybrid memories , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[17]  Alaa R. Alameldeen,et al.  Transparent Hardware Management of Stacked DRAM as Part of Memory , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[18]  Yan Solihin,et al.  Write-Aware Management of NVM-based Memory Extensions , 2016, ICS.

[19]  Ada Gavrilovska,et al.  HeteroOS — OS design for heterogeneous memory management in datacenter , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[20]  Bruce Jacob,et al.  The structural simulation toolkit , 2006, PERV.

[21]  Fernando Magno Quintão Pereira,et al.  Compiler support for selective page migration in NUMA architectures , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[22]  Chia-Lin Yang,et al.  PPT: joint performance/power/thermal management of DRAM memory for multi-core systems , 2009, ISLPED.

[23]  Dhabaleswar K. Panda,et al.  Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device , 2005, 2005 IEEE International Conference on Cluster Computing.

[24]  Nadav Amit,et al.  Optimizing the TLB Shootdown Algorithm with Page Access Tracking , 2017, USENIX Annual Technical Conference.

[25]  Song Liu,et al.  Hardware/software techniques for DRAM thermal management , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[26]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[27]  Ian Karlin,et al.  LULESH 2.0 Updates and Changes , 2013 .

[28]  H.-S. Philip Wong,et al.  Phase Change Memory , 2010, Proceedings of the IEEE.

[29]  Thomas F. Wenisch,et al.  System-level implications of disaggregated memory , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[30]  Yan Solihin,et al.  Avoiding TLB Shootdowns Through Self-Invalidating TLB Entries , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).