论文信息 - HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs

HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs

Data compression plays a pivotal role in improving system performance and reducing energy consumption, because it increases the logical effective capacity of a compressed memory system without physically increasing the memory size. However, data compression techniques incur some cost, such as non-negligible compression and decompression overhead. This overhead becomes more severe if compression is used in the cache. In this article, we aim to minimize the read-hit decompression penalty in compressed Last-Level Caches (LLCs) by speculatively decompressing frequently used cachelines. To this end, we propose a Hot-cacheline Prediction and Early decompression (HoPE) mechanism that consists of three synergistic techniques: Hot-cacheline Prediction (HP), Early Decompression (ED), and Hit-history-based Insertion (HBI). HP and HBI efficiently identify the hot compressed cachelines, while ED selectively decompresses hot cachelines, based on their size information. Unlike previous approaches, the HoPE framework considers the performance balance/tradeoff between the increased effective cache capacity and the decompression penalty. To evaluate the effectiveness of the proposed HoPE mechanism, we run extensive simulations on memory traces obtained from multi-threaded benchmarks running on a full-system simulation framework. We observe significant performance improvements over compressed cache schemes employing the conventional Least-Recently Used (LRU) replacement policy, the Dynamic Re-Reference Interval Prediction (DRRIP) scheme, and the Effective Capacity Maximizer (ECM) compressed cache management mechanism. Specifically, HoPE exhibits system performance improvements of approximately 11%, on average, over LRU, 8% over DRRIP, and 7% over ECM by reducing the read-hit decompression penalty by around 65%, over a wide range of applications.

[1] David A. Wood,et al. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .

[2] Jun Yang,et al. Frequent value compression in data caches , 2000, MICRO 33.

[3] Per Stenström,et al. SC2: A statistical compression cache scheme , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[4] Somayeh Sardashti,et al. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[6] References , 1971 .

[7] Jaehyuk Huh,et al. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[8] Chu Shik Jhon,et al. Adaptive cache compression for non-volatile memories in embedded system , 2014, RACS '14.

[9] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[10] Xiaowei Shen,et al. Performance of hardware compressed main memory , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[11] Steven K. Reinhardt,et al. A unified compressed memory hierarchy , 2005, 11th International Symposium on High-Performance Computer Architecture.

[12] Gabriel H. Loh,et al. Thread-aware dynamic shared cache compression in multi-core processors , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[13] Jongman Kim,et al. Size-Aware Cache Management for Compressed Cache Architectures , 2015, IEEE Transactions on Computers.

[14] Aamer Jaleel,et al. Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15] Christian Bienia,et al. PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[16] Krste Asanovic,et al. Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[17] Xi Chen,et al. C-Pack: A High-Performance Microprocessor Cache Compression Algorithm , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18] Urmila Shrawankar,et al. Hybrid Multi-level Cache Management Policy , 2014, 2014 Fourth International Conference on Communication Systems and Network Technologies.

[19] Onur Mutlu,et al. Exploiting compressed block size as an indicator of future reuse , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[20] Aamer Jaleel,et al. Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[21] David A. Wood,et al. Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[22] Jang-Soo Lee,et al. An on-chip cache compression technique to reduce decompression overhead and design complexity , 2000, J. Syst. Archit..

[23] Jongman Kim,et al. Designing Hybrid DRAM/PCM Main Memory Systems Utilizing Dual-Phase Compression , 2014, TODE.

[24] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.

[25] Carole-Jean Wu,et al. SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26] John T. Robinson,et al. Parallel compression with cooperative dictionary construction , 1996, Proceedings of Data Compression Conference - DCC '96.

[27] Onur Mutlu,et al. Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[28] Hai Zhou,et al. Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .

[29] Soontae Kim,et al. Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[30] Steven K. Reinhardt,et al. A compressed memory hierarchy using an indirect index cache , 2004, WMPI '04.

[31] Ali-Reza Adl-Tabatabai,et al. Compression in cache design , 2007, ICS '07.