High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policy

In recent years, NVM (non-volatile memory) technologies, such as STT-RAM (spin transfer torque RAM) and PRAM (phase change RAM), have drawn a lot of attention due to their low leakage and high density. However, both NVMs suffer from high write latency and limited endurance problems. To overcome these problems, the SRAM/NVM hybrid cache architecture has been proposed, and the write pressure on NVM can be mitigated with appropriate write management policy. Moreover, many wear leveling techniques have been proposed to extend the lifetime of NVM in the hybrid cache. In this paper, we proposed a hybrid cache design that includes SRAM cache, STT-RAM cache, and STT-RAM/SRAM hybrid cache banks for CMP (chip multi-processors) architecture. We also propose a partition-level wear leveling scheme and access-aware policies to mitigate unbalanced wear-out of STT-RAM lines within a partition and among different cache partitions. Experimental results show that, our proposed scheme and policies can achieve an average of 89 times improvement in cache lifetime and are able to save 58% power consumption compared to SRAM cache.

[1]  Jianhua Li,et al.  STT-RAM based energy-efficiency hybrid cache for CMPs , 2011, 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip.

[2]  Cong Xu,et al.  Adaptive placement and migration policy for an STT-RAM-based hybrid cache , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[3]  Yiran Chen,et al.  Coordinating prefetching and STT-RAM based last-level cache management for multicore systems , 2013, GLSVLSI '13.

[4]  Jun Yang,et al.  Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors , 2012, DAC Design Automation Conference 2012.

[5]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Jason Cong,et al.  Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Amin Jadidi,et al.  High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[8]  Francisco J. Cazorla,et al.  Multicore Resource Management , 2008, IEEE Micro.

[9]  Z. Diao,et al.  Spin-transfer torque switching in magnetic tunnel junctions and spin-transfer torque random access memory , 2007 .

[10]  Ahmad Patooghy,et al.  Coding Last Level STT-RAM Cache for High Endurance and Low Power , 2014, IEEE Computer Architecture Letters.

[11]  Yiran Chen,et al.  Processor caches built using multi-level spin-transfer torque RAM cells , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[12]  Jason Cong,et al.  Platform characterization for Domain-Specific Computing , 2012, 17th Asia and South Pacific Design Automation Conference.

[13]  Yuan Xie,et al.  Point and discard: A hard-error-tolerant architecture for non-volatile last level caches , 2012, DAC Design Automation Conference 2012.

[14]  Jason Cong,et al.  Static and dynamic co-optimizations for blocks mapping in hybrid caches , 2012, ISLPED '12.

[15]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Chong-Min Kyung,et al.  Hybrid cache architecture replacing SRAM cache with future memory technology , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[17]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[18]  Gang Wu,et al.  CAR: Securing PCM Main Memory System with Cache Address Remapping , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[19]  Chita R. Das,et al.  Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs , 2012, DAC Design Automation Conference 2012.

[20]  Luan Tran,et al.  45nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).

[21]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[22]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[23]  Jen-Wei Hsieh,et al.  Double Circular Caching Scheme for DRAM/PRAM Hybrid Cache , 2012, 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[24]  Jaehyuk Huh,et al.  A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[25]  Kiyoung Choi,et al.  DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[26]  Jason Cong,et al.  An energy-efficient adaptive hybrid cache , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[27]  Xiaoxia Wu,et al.  Power and performance of read-write aware Hybrid Caches with non-volatile memories , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[28]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[29]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.