QuARK: Quality-configurable approximate STT-MRAM cache by fine-grained tuning of reliability-energy knobs

Emerging STT-MRAM memories are promising alternatives for SRAM memories to tackle their low density and high static power consumption, but impose high energy consumption for reliable read/write operations. However, absolute data integrity is not required for many approximate computing applications, allowing energy savings with minimal quality loss. This paper proposes QuARK, a hardware/software approach for trading reliability of STT-MRAM caches for energy savings in the on-chip memory hierarchy of multi- and many-core systems running approximate applications. In contrast to SRAM-based cache-way-level actuators, QuARK utilizes fine-grained cache-line-level actuation knobs with different levels of reliability for individual read and write accesses which are unique to STT-MRAM and suitable for systems running multiple applications with mixed accuracy sensitivity, thus avoiding interapplication actuation interference. Our experimental results with a set of recognition, mining and synthesis (RMS) benchmarks demonstrate up to 40% energy savings over a fully-protected STT-MRAM cache, with negligible loss in the quality of the generated outputs.

[1]  Nikil D. Dutt,et al.  Exploiting Partially-Forgetful Memories for Approximate Computing , 2015, IEEE Embedded Systems Letters.

[2]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[3]  Song Liu,et al.  Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.

[4]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[5]  Nanning Zheng,et al.  Design techniques to improve the device write margin for MRAM-based cache memory , 2011, GLSVLSI '11.

[6]  Mehdi Baradaran Tahoori,et al.  Fault tolerant approximate computing using emerging non-volatile spintronic memories , 2016, 2016 IEEE 34th VLSI Test Symposium (VTS).

[7]  Sergio Bampi,et al.  Approximation-aware Multi-Level Cells STT-RAM cache architecture , 2015, 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[8]  S. Dasgupta,et al.  Nanoscale FinFET Based SRAM Cell Design: Analysis of Performance Metric, Process Variation, Underlapped FinFET, and Temperature Effect , 2011, IEEE Circuits and Systems Magazine.

[9]  Borivoje Nikolić,et al.  Level conversion for dual-supply systems , 2003, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Arnab Raha,et al.  Quality Configurable Approximate DRAM , 2017, IEEE Transactions on Computers.

[11]  B. Diény,et al.  Precessional spin-transfer switching in a magnetic tunnel junction with a synthetic antiferromagnetic perpendicular polarizer , 2012 .

[12]  Martin C. Rinard,et al.  Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.

[13]  Seyed Ghassem Miremadi,et al.  An Efficient Protection Technique for Last Level STT-RAM Caches in Multi-Core Processors , 2017, IEEE Transactions on Parallel and Distributed Systems.

[14]  Youguang Zhang,et al.  High reliability sensing circuit for deep submicron spin transfer torque magnetic random access memory , 2013 .

[15]  Kaushik Roy,et al.  Approximate storage for energy efficient spintronic memories , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[16]  H. Ohno,et al.  Single-shot time-resolved measurements of nanosecond-scale spin-transfer induced switching: stochastic versus deterministic aspects. , 2008, Physical review letters.

[17]  Natalie D. Enright Jerger,et al.  Doppelgänger: A cache for approximate computing , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Jacob Nelson,et al.  Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[20]  Kaushik Roy,et al.  Analysis and characterization of inherent application resilience for approximate computing , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[21]  Shoji Ikeda,et al.  A 32-Mb SPRAM With 2T1R Memory Cell, Localized Bi-Directional Write Driver and `1'/`0' Dual-Array Equalized Reference Scheme , 2010, IEEE Journal of Solid-State Circuits.

[22]  Nadezhda L'vovna Shchegoleva,et al.  An algorithm of face recognition under difficult lighting conditions , 2012 .

[23]  Avik W. Ghosh,et al.  A Quasi-Analytical Model for Energy-Delay-Reliability Tradeoff Studies During Write Operations in a Perpendicular STT-RAM Cell , 2012, IEEE Transactions on Electron Devices.

[24]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[25]  Kaushik Roy,et al.  High-performance low-energy STT MRAM based on balanced write scheme , 2012, ISLPED '12.

[26]  Yiran Chen,et al.  Performance, Power, and Reliability Tradeoffs of STT-RAM Cell Subject to Architecture-Level Requirement , 2011, IEEE Transactions on Magnetics.