Multilane Racetrack caches: Improving efficiency through compression and independent shifting

Racetrack memory (RM), a spintronic domain-wall non-volatile memory has recently received attention as a high-capacity replacement for various structures in the memory system from secondary storage through caches. The main advantage of RM is an improved density and like other non-volatile memory structures, the static power of RM is dramatically lower than conventional CMOS memories. However, a major challenge of employing RM in universal memory components is the added access latency and dynamic energy consumption caused by shifts to align the data of interest with an access port. We propose multilane Racetrack caches (MRC), a RM last level cache design utilizing lightweight compression combined with independent shifting. MRC allows cache lines mapped to the same Racetrack structure to be accessed in parallel when compressed, mitigating potential shifting stalls in the RM cache. Our results demonstrate that unlike previously proposed RM caches, an isocapacity MRC cache replacement can outperform SRAM caches while providing energy improvement over STT-MRAM caches. In particular, MRC improves performance by 5% and reduces energy by 19% compared to an isocapacity baseline RM cache resulting in an energy delay product improvement of 25%.

[1]  Krste Asanovic,et al.  Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[2]  David A. Wood,et al.  Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .

[3]  Kaushik Roy,et al.  DWM-TAPESTRI - An energy efficient all-spin cache using domain wall shift based writes , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  S. Parkin,et al.  Magnetic Domain-Wall Racetrack Memory , 2008, Science.

[5]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Shunsuke Fukami,et al.  Micromagnetic analysis of current driven domain wall motion in nanostrips with perpendicular magnetic anisotropy , 2008 .

[7]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Onur Mutlu,et al.  Linearly compressed pages: A low-complexity, low-latency main memory compression framework , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Yiran Chen,et al.  Exploration of GPGPU register file architecture using domain-wall-shift-write based racetrack memory , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  George F. Riley,et al.  Round-robin Arbiter Design and Generation , 2002, 15th International Symposium on System Synthesis, 2002..

[11]  Kaushik Roy,et al.  TapeCache: a high density, energy efficient cache based on domain wall memory , 2012, ISLPED '12.

[12]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[13]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[14]  Jun Yang,et al.  Frequent value compression in data caches , 2000, MICRO 33.

[15]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[16]  Onur Mutlu,et al.  Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[18]  K. Roy,et al.  Numerical analysis of domain wall propagation for dense memory arrays , 2011, 2011 International Electron Devices Meeting.

[19]  Jun Yang,et al.  Energy reduction for STT-RAM using early write termination , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[20]  Amit Grover Analysis and Comparison: Full Adder Block in Submicron Technology , 2013, 2013 Fifth International Conference on Computational Intelligence, Modelling and Simulation.

[21]  Yiran Chen,et al.  Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[22]  Wenqing Wu,et al.  Cross-layer racetrack memory design for ultra high density and low power consumption , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[23]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[24]  Yuan Xie,et al.  Modeling, Architecture, and Applications for Emerging Memory Technologies , 2011, IEEE Design & Test of Computers.

[25]  Rajeev Balasubramonian,et al.  MemZip: Exploring unconventional benefits from memory compression , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[26]  Kaushik Roy,et al.  Multi-level magnetic RAM using domain wall shift for energy-efficient, high-density caches , 2013, International Symposium on Low Power Electronics and Design (ISLPED).