Thermosiphon: A thermal aware NUCA architecture for write energy reduction of the STT-MRAM based LLCs

As the speed gap of the modern processor and the off-chip main memory enlarges, on-chip cache capacity increases to sustain the performance scaling. As a result, the cache power occupies a large portion of the total power budget. STT-MRAM (Spin Transfer Torque Magnetic Memory) is proposed as a promising solution for the low power cache design due to its high integration density and ultra-low leakage. Nevertheless, the high write power and latency of STT-MRAM become new barriers for the commercialization of this emerging technology. In this paper, we investigate the thermal effect on the access performance of STT-MRAM and observe that the temperature can affect the write delay and energy significantly. Then, we explore the NUCA (Non-Uniform Cache Access) design of the CMPs (Chip-Multi-Processors)with STT-MRAM based LLC (Last Level Cache). A thermal aware data migration policy, called “Thermosiphon”, which takes advantage of the thermal property of STT-MRAM, is proposed to reduce the LLC write energy. This policy splits the LLC into different regions based on the thermal distribution and adaptively migrate write intensive data considering the temperature gradient among different thermal regions. Compared to the conventional NUCA design, our proposed design can save 22.5% write energy with negligible hardware overhead.

[1]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[2]  Huawei Li,et al.  VANUCA: Enabling Near-Threshold Voltage Operation in Large-Capacity Cache , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Yiran Chen,et al.  Coordinating prefetching and STT-RAM based last-level cache management for multicore systems , 2013, GLSVLSI '13.

[4]  Aida Todri,et al.  Temperature Impact Analysis and Access Reliability Enhancement for 1T1MTJ STT-RAM , 2016, IEEE Transactions on Reliability.

[5]  Wei Zhang,et al.  Design of low power 3D hybrid memory by non-volatile CBRAM-crossbar with block-level data-retention , 2012, ISLPED '12.

[6]  Ying Wang,et al.  STT-RAM Buffer Design for Precision-Tunable General-Purpose Neural Network Accelerator , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  Gu-Yeon Wei,et al.  Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[8]  Jonathan Z. Sun Spin-current interaction with a monodomain magnetic body: A model study , 2000 .

[9]  Seung H. Kang,et al.  Low-temperature magnetic characterization of optimum and etch-damaged in-plane magnetic tunnel junctions , 2013 .

[10]  Maheshkumar P Jagtap Era of Multi-Core Processors , 2009 .

[11]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Arijit Raychowdhury,et al.  Design space and scalability exploration of 1T-1STT MTJ memory arrays in the presence of variability and disturbances , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).

[13]  Jun Yang,et al.  Energy reduction for STT-RAM using early write termination , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[14]  Wei Lu,et al.  Memristive devices for stochastic computing , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[15]  Babak Falsafi,et al.  Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.

[16]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[17]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Ryan N. Rakvic,et al.  Replacement techniques for dynamic NUCA cache designs on CMPs , 2013, The Journal of Supercomputing.

[19]  Jun Yang,et al.  Thermal-Aware Task Scheduling for 3D Multicore Processors , 2010, IEEE Transactions on Parallel and Distributed Systems.

[20]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[21]  Lirida A. B. Naviner,et al.  Compact model of magnetic tunnel junction with stochastic spin transfer torque switching for reliability analyses , 2014, Microelectron. Reliab..

[22]  Weisheng Zhao,et al.  A compact model for magnetic tunnel junction (MTJ) switched by thermally assisted Spin transfer torque (TAS + STT) , 2011, Nanoscale research letters.

[23]  A. Fert,et al.  The emergence of spin electronics in data storage. , 2007, Nature materials.

[24]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[25]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[26]  Jose Renau,et al.  Characterizing processor thermal behavior , 2010, ASPLOS 2010.