Evaluation of emerging memory technologies for HPC, data intensive applications

DRAM technology has several shortcomings in terms of performance, energy efficiency and scaling. Several emerging memory technologies have the potential to compensate for the limitations of DRAM when replacing or complementing DRAM in the memory sub-system. In this paper, we evaluate the impact of emerging technologies on HPC and data-intensive workloads modeling a 5-level hybrid memory hierarchy design. Our results show that 1) an additional level of faster DRAM technology (i.e. EDRAM or HMC) interposed between the last level cache and DRAM can improve performance and energy efficiency, 2) a non-volatile main memory (i.e. PCM, STTRAM, or FeRAM) with a small DRAM acting as a cache can reduce the cost and energy consumption at large capacities, and 3) a combination of the two approaches, which essentially replaces the traditional DRAM with a small EDRAM or HMC cache between the last level cache and the non-volatile memory, can grant capacity and improved performance and energy efficiency. We also explore a hybrid DRAM-NVM design with a partitioned address space and find that this approach is marginally beneficial compared to the simpler 5-level design. Finally, we generalize our analysis and show the impact of emerging technologies for a range of latency and energy parameters.

[1]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[2]  Dejan S. Milojicic,et al.  Optimizing Checkpoints Using NVM as Virtual Memory , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[3]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[4]  Allan Snavely,et al.  Evaluation of I/O technologies on a flash-based I/O sub-system for HPC , 2011, ASBD '11.

[5]  Hideto Niijima Design of a solid-state file using flash EEPROM , 1995, IBM J. Res. Dev..

[6]  K QureshiMoinuddin,et al.  Scalable high performance main memory system using phase-change memory technology , 2009 .

[7]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[8]  Yuan Xie,et al.  OAP: An obstruction-aware cache management policy for STT-RAM last-level caches , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[9]  Jun Yang,et al.  Energy reduction for STT-RAM using early write termination , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[10]  J. Jeddeloh,et al.  Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[11]  Vivek S. Pai,et al.  SSDAlloc: Hybrid SSD/RAM Memory Management Made Easy , 2011, NSDI.

[12]  Thomas M. Conte,et al.  Energy efficient Phase Change Memory based main memory for future high performance systems , 2011, 2011 International Green Computing Conference and Workshops.

[13]  Tao Li,et al.  Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[14]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[15]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[16]  Michael Laurenzano,et al.  PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[17]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[18]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[19]  Dong Li,et al.  Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[20]  Alex Ramírez,et al.  Data placement in HPC architectures with heterogeneous off-chip memory , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[21]  David Blaauw,et al.  Exploring DRAM organizations for energy-efficient and resilient exascale memories , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[22]  S. Shiratake,et al.  A 64Mb Chain FeRAM with Quad-BL Architecture and 200MB/s Burst Mode , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[23]  Tohru Ozaki,et al.  A 1.6 GB/s DDR2 128 Mb Chain FeRAM With Scalable Octal Bitline and Sensing Schemes , 2010, IEEE Journal of Solid-State Circuits.

[24]  Richard E. Matick,et al.  A 500MHz Random Cycle 1.5ns-Latency, SOI Embedded DRAM Macro Featuring a 3T Micro Sense Amplifier , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[25]  Alex Ramírez,et al.  On the memory system requirements of future scientific applications: Four case-studies , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[26]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Tohru Ozaki,et al.  A Scalable Shield-Bitline-Overdrive Technique for Sub-1.5 V Chain FeRAMs , 2011, IEEE Journal of Solid-State Circuits.

[28]  Tohru Ozaki,et al.  A scalable shield-bitline-overdrive technique for 1.3V Chain FeRAM , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[29]  M.H. Kryder,et al.  After Hard Drives—What Comes Next? , 2009, IEEE Transactions on Magnetics.

[30]  Norman P. Jouppi,et al.  Practical nonvolatile multilevel-cell phase change memory , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[31]  Ricardo Bianchini,et al.  Page placement in hybrid memory systems , 2011, ICS '11.