Approximate memory compression for energy-efficiency

Memory subsystems are a major energy bottleneck in computing platforms due to frequent transfers between processors and off-chip memory. We propose approximate memory compression, a technique that leverages the intrinsic resilience of emerging workloads such as machine learning and data analytics to reduce off-chip memory traffic and energy. To realize approximate memory compression, we enhance the memory controller to be aware of memory regions that contain approximation-resilient data, and to transparently compress/decompress the data written to/read from these regions. To provide control over approximations, the quality-aware memory controller conforms to a specified error constraint for each approximate memory region. We design a software interface that programmers can use to identify data structures that are resilient to approximations. We also propose a runtime quality control framework that automatically determines the error constraints for the identified data structures such that a given target application-level quality is maintained. We evaluate our proposal by implementing a hardware prototype using the Intel UniPHY-DDR3 memory controller and NIOS-II processor, a Hynix DDR3 DRAM module, and a Stratix-IV FPGA development board. Across a suite of 8 machine learning benchmarks, approximate memory compression obtains a 1.28× benefit in DRAM energy and a simultaneous 11.5% improvement in execution time for a small (< 1.5%) loss in output quality.

[1]  Luca Benini,et al.  Hardware-assisted data compression for energy minimization in systems with embedded processors , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[2]  Natalie D. Enright Jerger,et al.  Doppelgänger: A cache for approximate computing , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Kaushik Roy,et al.  STAxCache: An approximate, energy efficient STT-MRAM cache , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[4]  Song Liu,et al.  Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.

[5]  Norbert Wehn,et al.  Efficient reliability management in SoCs - an approximate DRAM perspective , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[6]  Arnab Raha,et al.  Quality Configurable Approximate DRAM , 2017, IEEE Transactions on Computers.

[7]  Qiang Xu,et al.  Approximate Computing: A Survey , 2016, IEEE Design & Test.

[8]  Kaushik Roy,et al.  Approximate storage for energy efficient spintronic memories , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Mario Badr,et al.  Load Value Approximation , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  Onur Mutlu,et al.  Linearly compressed pages: A low-complexity, low-latency main memory compression framework , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Scott A. Mahlke,et al.  Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Kaushik Roy,et al.  Approximate computing and the quest for computing efficiency , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  Jacob Nelson,et al.  Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).