HiNUMA: NUMA-Aware Data Placement and Migration in Hybrid Memory Systems

Non-uniform memory access (NUMA) architectures feature asymmetrical memory access latencies on different CPU nodes. Hybrid memory systems composed of non-volatile memory (NVM) and DRAM further diversify memory access latencies due to the relatively large performance gap between NVM and DRAM. Traditional NUMA memory management policies fail to manage hybrid memories effectively and may even hurt application performance. In this paper, we present HiNUMA, a new NUMA abstraction for memory allocation and migration in hybrid memory systems. HiNUMA advocates NUMA topologyaware hybrid memory allocation policies for the initial data placement. HiNUMA also proposes a new NUMA balancing mechanism called HANB for memory migration at runtime. HANB considers both data access frequency and memory bandwidth utilization to reduce the cost of memory accesses in hybrid memory systems. We evaluate the performance of HiNUMA with several typical workloads. Experimental results show that HiNUMA can effectively utilize hybrid memories, and deliver much higher application performance than conventional NUMA memory management policies and other state-of-the-art work.

[1]  Tao Li,et al.  Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[2]  Ricardo Bianchini,et al.  Page placement in hybrid memory systems , 2011, ICS '11.

[3]  Karsten Schwan,et al.  HeteroVisor: Exploiting Resource Heterogeneity to Enhance the Elasticity of Cloud Platforms , 2015, VEE.

[4]  Ada Gavrilovska,et al.  HeteroOS — OS design for heterogeneous memory management in datacenter , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[5]  Woongki Baek,et al.  Design and implementation of bandwidth-aware memory placement and migration policies for heterogeneous memory systems , 2017, ICS '17.

[6]  Ada Gavrilovska,et al.  pVM: persistent virtual memory for efficient capacity scaling and object storage , 2016, EuroSys.

[7]  Ada Gavrilovska,et al.  Kleio: A Hybrid Memory Page Scheduler with Machine Intelligence , 2019, HPDC.

[8]  Karsten Schwan,et al.  Data tiering in heterogeneous memory systems , 2016, EuroSys.

[9]  Thomas R. Gross,et al.  (Mis)understanding the NUMA memory system performance of multithreaded workloads , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[10]  Aamer Jaleel,et al.  BATMAN: techniques for maximizing system bandwidth of memory systems with stacked-DRAM , 2017, MEMSYS.

[11]  Stephen W. Keckler,et al.  Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.

[12]  Vivien Quéma,et al.  Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.

[13]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[14]  Thomas F. Wenisch,et al.  Thermostat: Application-transparent Page Management for Two-tiered Main Memory , 2017, ASPLOS.

[15]  Hai Jin,et al.  Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures , 2017, ICS '17.

[16]  Hao Wang,et al.  DUANG: Fast and lightweight page migration in asymmetric memory systems , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[17]  Rachata Ausavarungnirun,et al.  Row buffer locality aware caching policies for hybrid memories , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[18]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[19]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[21]  Hai Jin,et al.  HME: A lightweight emulator for hybrid memory , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[22]  Yuan Xie,et al.  Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[24]  Michael M. Swift,et al.  BadgerTrap: a tool to instrument x86-64 TLB misses , 2014, CARN.