Preventing STT-RAM Last-Level Caches from Port Obstruction

Many new nonvolatile memory (NVM) technologies have been heavily studied to replace the power-hungry SRAM/DRAM-based memory hierarchy in today's computers. Among various emerging NVM technologies, Spin-Transfer Torque RAM (STT-RAM) has many benefits, such as fast read latency, low leakage power, and high density, making it a promising candidate for last-level caches (LLCs).1 However, STT-RAM write operation is expensive. In particular, a long STT-RAM cache write operation might obstruct other cache accesses and result in severe performance degradation. Consequently, how to mitigate STT-RAM write overhead is critical to the success of STT-RAM adoption. In this article, we propose an obstruction-aware cache management policy called OAP. OAP monitors cache traffic, detects LLC-obstructive processes, and differentiates the cache accesses from different processes. Our experiment on a four-core architecture with an 8MB STT-RAM L3 cache shows a 14% performance improvement and 64% energy reduction.

[1]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[2]  Yoshihiro Ueda,et al.  A 64Mb MRAM with clamped-reference and adequate-reference schemes , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[3]  Yiran Chen,et al.  Design of Last-Level On-Chip Cache Using Spin-Torque Transfer RAM (STT RAM) , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[5]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[6]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[7]  Mahmut T. Kandemir,et al.  SHARP control: Controlled shared cache management in chip multiprocessors , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Yiran Chen,et al.  Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[10]  Mircea R. Stan,et al.  Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[11]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[12]  Shoji Ikeda,et al.  2Mb Spin-Transfer Torque RAM (SPRAM) with Bit-by-Bit Bidirectional Current Write and Parallelizing-Direction Current Read , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[13]  Kumiko Nomura,et al.  Ultra low power processor using perpendicular-STT-MRAM/SRAM based hybrid cache toward next generation normally-off computers , 2012 .

[14]  John K. DeBrosse,et al.  Design considerations for MRAM , 2006, IBM J. Res. Dev..

[15]  Jun Yang,et al.  Energy reduction for STT-RAM using early write termination , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[16]  Jung Ho Ahn,et al.  A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.

[17]  S. Watts,et al.  Latest Advances and Roadmap for In-Plane and Perpendicular STT-RAM , 2011, 2011 3rd IEEE International Memory Workshop (IMW).

[18]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[19]  C.H. Kim,et al.  A forward body-biased-low-leakage SRAM cache: device and architecture considerations , 2003, Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03..

[20]  Saurabh Gupta,et al.  Adaptive Cache Bypassing for Inclusive Last Level Caches , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[21]  Wenqing Wu,et al.  Multi retention level STT-RAM cache designs with a dynamic refresh scheme , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Kaushik Roy,et al.  A forward body-biased low-leakage SRAM cache: device and architecture considerations , 2003, ISLPED '03.