LAP: Loop-Block Aware Inclusion Properties for Energy-Efficient Asymmetric Last Level Caches

Emerging non-volatile memory (NVM) technologies, such as spin-transfer torque RAM (STT-RAM), are attractive options for replacing or augmenting SRAM in implementing last-level caches (LLCs). However, the asymmetric read/write energy and latency associated with NVM introduces new challenges in designing caches where, in contrast to SRAM, dynamic energy from write operations can be responsible for a larger fraction of total cache energy than leakage. These properties lead to the fact that no single traditional inclusion policy being dominant in terms of LLC energy consumption for asymmetric LLCs. We propose a novel selective inclusion policy, Loop-block-Aware Policy (LAP), to reduce energy consumption in LLCs with asymmetric read/write properties. In order to eliminate redundant writes to the LLC, LAP incorporates advantages from both non-inclusive and exclusive designs to selectively cache only part of upper-level data in the LLC. Results show that LAP outperforms other variants of selective inclusion policies and consumes 20% and 12% less energy than non-inclusive and exclusive STT-RAM-based LLCs, respectively. We extend LAP to a system with SRAM/STT-RAM hybrid LLCs to achieve energy-efficient data placement, reducing the energy consumption by 22% and 15% over non-inclusion and exclusion on average, with average-case performance improvements, small worst-case performance loss, and minimal hardware overheads.

[1]  Aamer Jaleel,et al.  Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[3]  Mohamed M. Zahran,et al.  Non-Inclusion Property in Multi-Level Caches Revisited , 2007, Int. J. Comput. Their Appl..

[4]  M. J. Irwin,et al.  Dswitch : Write-aware Dynamic Inclusion Property Switching for Emerging Asymmetric Memory Technologies , 2016 .

[5]  Ying Zheng,et al.  Performance evaluation of exclusive cache hierarchies , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[6]  Jason Cong,et al.  Static and dynamic co-optimizations for blocks mapping in hybrid caches , 2012, ISLPED '12.

[7]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Adrian Moga,et al.  High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[9]  Balaram Sinharoy,et al.  The implementation of POWER7TM: A highly parallel and scalable multi-core high-end server processor , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[10]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[11]  Shoji Ikeda,et al.  1Mb 4T-2MTJ nonvolatile STT-RAM for embedded memories using 32b fine-grained power gating technique with 1.0ns/200ps wake-up/power-off times , 2012, 2012 Symposium on VLSI Circuits (VLSIC).

[12]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[13]  Satoshi Takaya,et al.  7.5 A 3.3ns-access-time 71.2μW/MHz 1Mb embedded STT-MRAM using physically eliminated read-disturb scheme and normally-off memory architecture , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[14]  Xueti Tang,et al.  Spin-transfer torque magnetic random access memory (STT-MRAM) , 2013, JETC.

[15]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  William Song,et al.  Negative-resistance read and write schemes for STT-MRAM in 0.13µm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[17]  Yu Wang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[18]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[19]  Xiaoxia Wu,et al.  Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.

[20]  Saori Kashiwada,et al.  A 250-MHz 256b-I/O 1-Mb STT-MRAM with advanced perpendicular MTJ based dual cell for nonvolatile magnetic caches to reduce active power of processors , 2013, 2013 Symposium on VLSI Circuits.

[21]  Wen-Hann Wang,et al.  On the inclusion properties for multi-level cache hierarchies , 1988, ISCA '88.

[22]  Yoshihiro Ueda,et al.  A 64Mb MRAM with clamped-reference and adequate-reference schemes , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[23]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[24]  A. Driskill-Smith,et al.  Fully integrated 54nm STT-RAM with the smallest bit cell dimension for high density memory application , 2010, 2010 International Electron Devices Meeting.

[25]  Kiyoung Choi,et al.  DASCA: Dead Write Prediction Assisted STT-RAM Cache Architecture , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[26]  A. Kumar,et al.  Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip , 2008, IEEE Journal of Solid-State Circuits.

[27]  Mainak Chaudhuri,et al.  Bypass and insertion algorithms for exclusive last-level caches , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[28]  Christopher Mozak,et al.  Westmere: A family of 32nm IA processors , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[29]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[30]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[31]  Cong Xu,et al.  Adaptive placement and migration policy for an STT-RAM-based hybrid cache , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[32]  Dmitri B. Strukov,et al.  Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs , 2016, ISCA.

[33]  Hiroki Noguchi,et al.  Highly reliable and low-power nonvolatile cache memory with advanced perpendicular STT-MRAM for high-performance CPU , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[34]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[35]  Hyesoon Kim,et al.  FLEXclusion: Balancing cache capacity and on-chip bandwidth via Flexible Exclusion , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[36]  Mircea R. Stan,et al.  Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[37]  N. Jouppi,et al.  Tradeoffs in two-level on-chip caching , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[38]  Bruce Jacob,et al.  Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[39]  Jun Yang,et al.  Phase-Change Technology and the Future of Main Memory , 2010, IEEE Micro.

[40]  Tao Zhang,et al.  Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).