Content-Aware Non-Volatile Cache Replacement

Spin-Transfer Torque Magnetoresistive Random-Access Memory (STT-MRAM) is a promising memory technology, which has high density, fast read speed, low leakage power, and non-volatility, and is suitable for multi-core on-chip last-level caches. However, the high write energy and latency, as well as less-than-desirable write endurance of STT-MRAM remain challenges. This paper proposes a new encoded content-aware cache replacement policy to reduce the total switch bits for write, lower the write energy, and improve write endurance. Instead of replacing the LRU block under the conventional pseudo-LRU replacement policy, we select a replacement block near the LRU position, which has the most similar content to the missed block. The selected replacement block can reduce the switch bits without damaging the cache performance. To avoid fetching and comparing the entire block contents, we present a novel content encoding method to encode 64-byte block using just 8 bits, each bit represents 8-byte content. The encoded bit is determined by the presence of a dominant bit value in the 8 bytes. We measure the content similarity using the Hamming distance between the encoded bits of the missed block and the replaced block. Performance evaluation demonstrates that the proposed simple content encoding method is effective with an average of 20.5% reduction in total switch bits, which results in improvement on write endurance and less write energy consumption. These improvements are accomplished with low overhead and minimum impact on the cache performance.

[1]  M. Hosomi,et al.  A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[2]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[3]  Behrooz Parhami,et al.  Efficient Hamming Weight Comparators for Binary Vectors Based on Accumulative and Up/Down Parallel Counters , 2009, IEEE Transactions on Circuits and Systems II: Express Briefs.

[4]  Xiaoxia Wu,et al.  Design exploration of hybrid caches with disparate memory technologies , 2010, TACO.

[5]  Jianhua Li,et al.  STT-RAM based energy-efficiency hybrid cache for CMPs , 2011, 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip.

[6]  Jun Yang,et al.  Energy reduction for STT-RAM using early write termination , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[7]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Xiaoxia Wu,et al.  Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.

[9]  Per Stenström,et al.  SC2: A statistical compression cache scheme , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[10]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[11]  Kaushik Roy,et al.  Future cache design using STT MRAMs for improved energy efficiency: Devices, circuits and architecture , 2012, DAC Design Automation Conference 2012.

[12]  Michael Gschwind,et al.  IBM POWER8 processor core microarchitecture , 2015, IBM J. Res. Dev..

[13]  Cong Xu,et al.  Device-architecture co-optimization of STT-RAM based memory for low power embedded systems , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[14]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[15]  Yohei Nakata,et al.  Energy-efficient Spin-Transfer Torque RAM cache exploiting additional all-zero-data flags , 2013, International Symposium on Quality Electronic Design (ISQED).

[16]  Mircea R. Stan,et al.  Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[17]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[18]  Jeffrey S. Vetter,et al.  A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems , 2016 .

[19]  Sudarshan Tiwari,et al.  Performance Analysis of High Speed Hybrid CMOS Full Adder Circuits for Low Voltage VLSI Design , 2012, VLSI Design.

[20]  Nozomu Togawa,et al.  A write-reducing and error-correcting code generation method for non-volatile memories , 2014, 2014 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS).

[21]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Chita R. Das,et al.  Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs , 2012, DAC Design Automation Conference 2012.

[23]  E. Belhaire,et al.  Macro-model of Spin-Transfer Torque based Magnetic Tunnel Junction device for hybrid Magnetic-CMOS design , 2006, 2006 IEEE International Behavioral Modeling and Simulation Workshop.

[24]  Cong Xu,et al.  Adaptive placement and migration policy for an STT-RAM-based hybrid cache , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[25]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[26]  Kartik Mohanram,et al.  Flip-Mirror-Rotate: An Architecture for Bit-write Reduction and Wear Leveling in Non-volatile Memories , 2015, ACM Great Lakes Symposium on VLSI.

[27]  Ahmad Patooghy,et al.  Coding Last Level STT-RAM Cache for High Endurance and Low Power , 2014, IEEE Computer Architecture Letters.

[28]  Yuan Xie,et al.  Architecture design with STT-RAM: Opportunities and challenges , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[29]  Stijn Eyerman,et al.  An Evaluation of High-Level Mechanistic Core Models , 2014, ACM Trans. Archit. Code Optim..

[30]  Sudhakar Yalamanchili,et al.  An energy efficient cache design using Spin Torque Transfer (STT) RAM , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[31]  Wei-Che Tseng,et al.  Data Allocation Optimization for Hybrid Scratch Pad Memory With SRAM and Nonvolatile Memory , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.