Efficient Similarity-aware Compression to Reduce Bit-writes in Non-Volatile Main Memory for Image-based Applications

Image bitmaps have been widely used in in-memory applications, which consume lots of storage space and energy. Compared with legacy DRAM, non-volatile memories (NVMs) are suitable for bitmap storage due to the salient features in capacity and power savings. However, NVMs suffer from higher latency and energy consumption in writes compared with reads. Although compressing data in write accesses to NVMs on-the-fly reduces the bit-writes in NVMs, existing precise or approximate compression schemes show limited performance improvements for data of bitmaps, due to the irregular data patterns and variance in data. We observe that the data containing bitmaps show the pixel-level similarity due to the analogous contents in adjacent pixels. By exploiting the pixel-level similarity, we propose SimCom, an efficient similarity-aware compression scheme in hardware layer, to compress data for each write access on-the-fly. The idea behind SimCom is to compress continuous similar words into the pairs of base words with runs. With the aid of domain knowledge of images, SimCom adaptively selects an appropriate compression mode to achieve an efficient trade-off between image quality and memory performance. We implement SimCom on GEM5 with NVMain and evaluate the performance with real-world workloads. Our results demonstrate that SimCom reduces 33.0%, 34.8% write latency and saves 28.3%, 29.0% energy than state-of-the-art FPC and BDI with minor quality loss of 3%.

[1]  Seung-Yun Lee,et al.  A Low Power Phase-Change Random Access Memory using a Data-Comparison Write Scheme , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[2]  Jie Xu,et al.  Extending the lifetime of NVMs with compression , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Mario Badr,et al.  Load Value Approximation , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  Tao Li,et al.  Exploring high-performance and energy proportional interface for phase change memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[5]  Hadi Esmaeilzadeh,et al.  AxBench: A Multiplatform Benchmark Suite for Approximate Computing , 2017, IEEE Design & Test.

[6]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[7]  Henrique S. Malvar,et al.  Approximate Storage of Compressed and Encrypted Videos , 2017, ASPLOS.

[8]  Jiayin Li,et al.  Compression architecture for bit-write reduction in non-volatile memory technologies , 2014, 2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH).

[9]  Jishen Zhao,et al.  Approximate image storage with multi-level cell STT-MRAM main memory , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Yu Hua,et al.  DFPC: A dynamic frequent pattern compression scheme in NVM-based main memory , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  Rakesh Kumar,et al.  VideoChef: Efficient Approximation for Streaming Video Processing Pipelines , 2018, USENIX Annual Technical Conference.

[12]  R. Weisberg A-N-D , 2011 .

[13]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[14]  Yan Solihin,et al.  Proteus: A Flexible and Fast Software Supported Hardware Logging approach for NVM , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  F. Dufaux,et al.  The JPEG XR image coding standard [Standards in a Nutshell] , 2009, IEEE Signal Processing Magazine.

[16]  Kartik Mohanram,et al.  CASTLE: Compression Architecture for Secure Low Latency, Low Energy, High Endurance NVMs , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[17]  Audrey Tarrant,et al.  Color in Business, Science and Industry , 1976 .

[18]  Jian Xu,et al.  NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System , 2017, SOSP.

[19]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[20]  Keshav Pingali,et al.  Proactive Control of Approximate Programs , 2016, ASPLOS.

[21]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[22]  Scott A. Mahlke,et al.  Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.

[23]  Tudor David,et al.  Log-Free Concurrent Data Structures , 2018, USENIX Annual Technical Conference.

[24]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Jacob Nelson,et al.  Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Onur Mutlu,et al.  Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[27]  Henrique S. Malvar,et al.  High-Density Image Storage Using Approximate Memory Cells , 2016, ASPLOS.

[28]  Tom Duff,et al.  Compositing digital images , 1984, SIGGRAPH.

[29]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[30]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[31]  Youjip Won,et al.  Endurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree , 2018, FAST.

[32]  A. Robert Calderbank,et al.  Coset coding to extend the lifetime of memory , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[33]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[34]  Tao Zhang,et al.  NVMain 2.0: A User-Friendly Memory Simulator to Model (Non-)Volatile Memory Systems , 2015, IEEE Computer Architecture Letters.

[35]  Moinuddin K. Qureshi,et al.  CRAM: Efficient Hardware-Based Memory Compression for Bandwidth Enhancement , 2018, ArXiv.

[36]  Tom Duff Deep Compositing Using Lie Algebras , 2017, ACM Trans. Graph..

[37]  Alper Buyuktosunoglu,et al.  Attaché: Towards Ideal Memory Compression by Mitigating Metadata Bandwidth Overheads , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[38]  Kartik Mohanram,et al.  CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM , 2016, HPCA.

[39]  Natalie D. Enright Jerger,et al.  Doppelgänger: A cache for approximate computing , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[40]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[41]  Song Liu,et al.  Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.

[42]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[43]  Anand Raghunathan,et al.  Approximate memory compression for energy-efficiency , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[44]  Yifeng Zhu,et al.  Accelerating write by exploiting PCM asymmetries , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[45]  Natalie D. Enright Jerger,et al.  The Bunker Cache for spatio-value approximation , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[46]  Jie Wu,et al.  Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory , 2018, OSDI.

[47]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[48]  Xi Wang,et al.  Customizing Progressive JPEG for Efficient Image Storage , 2017, HotStorage.

[49]  Scott A. Mahlke,et al.  Input responsiveness: using canary inputs to dynamically steer approximation , 2016, PLDI.

[50]  Kaushik Roy,et al.  STAxCache: An approximate, energy efficient STT-MRAM cache , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.