Performance characterization of a DRAM-NVM hybrid memory architecture for HPC applications using intel optane DC persistent memory modules

Non-volatile, byte-addressable memory (NVM) has been introduced by Intel in the form of NVDIMMs named Intel® Optane™ DC PMM. This memory module has the ability to persist the data stored in it without the need for power. This expands the memory hierarchy into a hybrid memory system due the differences in access latency and memory bandwidth from DRAM, which has been the predominant byte-addressable main memory technology. The Optane DC memory modules have up to 8x the capacity of DDR4 DRAM modules which can expand the byte-address space up to 6 TB per node. Many applications can now scale up the their problem size given such a memory system. We evaluate the capabilities of this DRAM-NVM hybrid memory system and its impact on High Performance Computing (HPC) applications. We characterize the Optane DC in comparison to DDR4 DRAM with a STREAM-like custom benchmark and measure the performance for HPC mini-apps like VPIC, SNAP, LULESH and AMG under different configurations of Optane DC PMMs. We find that Optane-only executions are slower in terms of execution time than DRAM-only and Memory-mode executions by a minimum of 2 to 16% for VPIC and maximum of 6x for LULESH.

[1]  Viktor Leis,et al.  Persistent Memory I/O Primitives , 2019, DaMoN.

[2]  Keshav Pingali,et al.  Single machine graph analytics on massive datasets using Intel optane DC persistent memory , 2019, Proc. VLDB Endow..

[3]  Hideto Hidaka,et al.  The cache DRAM architecture: a DRAM with an on-chip cache memory , 1990, IEEE Micro.

[4]  Brian Rogers,et al.  Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.

[5]  Christian Engelmann,et al.  Failures in Large Scale Systems: Long-term Measurement, Analysis, and Implications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Gerhard Wellein,et al.  LIKWID: Lightweight Performance Tools , 2011, CHPC.

[7]  Hans-Juergen Boehm,et al.  Atlas: leveraging locks for non-volatile memory consistency , 2014, OOPSLA.

[8]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[9]  Ismail Oukid,et al.  Bridging the Latency Gap between NVM and DRAM for Latency-bound Operations , 2019, DaMoN.

[10]  Onur Mutlu,et al.  Memory scaling: A systems architecture perspective , 2013, 2013 5th IEEE International Memory Workshop.

[11]  V. E. Henson,et al.  BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .

[12]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[13]  Dong Li,et al.  Early Evaluation of Intel Optane Non-Volatile Memory with HPC I/O Workloads , 2017, ArXiv.

[14]  Jeffrey S. Vetter,et al.  Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing , 2015, Computing in Science & Engineering.

[15]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[16]  Dejan S. Milojicic,et al.  Optimizing Checkpoints Using NVM as Virtual Memory , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[17]  Eric Pop,et al.  Phase change materials and phase change memory , 2014 .

[18]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[19]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[20]  Chao Wang,et al.  NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[21]  Gerhard Wellein,et al.  LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[22]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[23]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[24]  MutluOnur,et al.  Architecting phase change memory as a scalable dram alternative , 2009 .

[25]  Xu Zhou,et al.  NV-process: a fault-tolerance process model based on non-volatile memory , 2012, APSys.

[26]  Josep Torrellas,et al.  AutoPersist: an easy-to-use Java NVM framework based on reachability , 2019, PLDI.

[27]  William Daughton,et al.  Advances in petascale kinetic plasma simulation with VPIC and Roadrunner , 2009 .

[28]  Xueti Tang,et al.  Spin-transfer torque magnetic random access memory (STT-MRAM) , 2013, JETC.

[29]  Xiao Liu,et al.  Basic Performance Measurements of the Intel Optane DC Persistent Memory Module , 2019, ArXiv.

[30]  SolihinYan,et al.  Scaling the bandwidth wall , 2009 .

[31]  Ian Karlin,et al.  LULESH 2.0 Updates and Changes , 2013 .

[32]  Ravi Nair,et al.  Evolution of Memory Architecture , 2015, Proceedings of the IEEE.

[33]  Robert Latham,et al.  A next-generation parallel file system for Linux cluster. , 2004 .