CMH: compression management for improving capacity in the hybrid memory cube

The Hybrid Memory Cube (HMC) is a novel 3D memory architecture that efficiently improves bandwidth and saves energy. However, due to limitations in scalability and power density of a DRAM bit cell, the physical data capacity of an individual HMC is relatively modest and unlikely to grow significantly and it is likely to be a challenge in adopting the HMC for big data in high-performance computing. In this paper, we propose a new strategy to increase the effective data capacity of the HMC, called Compression Management for HMC (CMH). CMH is incorporated in the logic layer of the HMC. By selectively compressing data during transmission and storing the selectively compressed data in the 3D memory stack, CMH increases data capacity while also improving effective bandwidth. For several memory-intensive benchmarks, our results show that CMH reduces pressure on memory capacity by 64.4%, and improves bandwidth by 42.4%. Similarly good results are observed for multi-programmed workloads, reducing capacity 66.2% and improving bandwidth 47.8%. Although compression has latency overhead, by introducing a small cache in the HMC logic layer to store metadata for compression, CMH mitigates any increase in transaction latency. The overhead in instructions per cycle is a minimal 1.2% and 1.5%, respectively, for single-core and multi-core workloads. The IPC is stable and is not harmed by the inclusion of compression.

[1]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[2]  Chu Shik Jhon,et al.  Adaptive cache compression for non-volatile memories in embedded system , 2014, RACS '14.

[3]  Fred Douglis,et al.  The Compression Cache: Using On-line Compression to Extend Physical Memory , 1993, USENIX Winter.

[4]  Jaehyuk Huh,et al.  Transparent Dual Memory Compression Architecture , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Aamer Jaleel,et al.  Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[6]  Jongman Kim,et al.  A Compression-Based Hybrid MLC/SLC Management Technique for Phase-Change Memory Systems , 2012, 2012 IEEE Computer Society Annual Symposium on VLSI.

[7]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[8]  Yuan Xie,et al.  Cost-aware three-dimensional (3D) many-core multiprocessor design , 2010, Design Automation Conference.

[9]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Hyesoon Kim,et al.  Instruction Offloading with HMC 2.0 Standard: A Case Study for Graph Traversals , 2015, MEMSYS.

[11]  Michael E. Wazlowski,et al.  IBM Memory Expansion Technology (MXT) , 2001, IBM J. Res. Dev..

[12]  Chong-Min Kyung,et al.  Static energy minimization of 3D stacked L2 cache with selective cache compression , 2013, 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC).

[13]  Jeffrey S. Vetter,et al.  A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems , 2016 .

[14]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[15]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[16]  Onur Mutlu,et al.  Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[18]  Mikko H. Lipasti,et al.  Data compression for thermal mitigation in the Hybrid Memory Cube , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[19]  Ki-Seok Chung,et al.  CasHMC: A Cycle-Accurate Simulator for Hybrid Memory Cube , 2017, IEEE Computer Architecture Letters.

[20]  M. Ekman,et al.  A robust main-memory compression scheme , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[21]  Rami G. Melhem,et al.  Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory , 2013, TACO.

[22]  Tajana Simunic,et al.  PDRAM: A hybrid PRAM and DRAM main memory system , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[23]  Jiayin Li,et al.  Compression architecture for bit-write reduction in non-volatile memory technologies , 2014, 2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH).

[24]  Bruce Jacob,et al.  Buffer-on-board memory systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[25]  Yuan Xie,et al.  Fabrication Cost Analysis and Cost-Aware Design Space Exploration for 3-D ICs , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[26]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.