Reliability Tradeoffs in Design of Volatile and Nonvolatile Caches

Researchers have explored both volatile memories (e.g., SRAM and embedded DRAM) and nonvolatile memories (NVMs, such as resistive RAM) for design of on-chip caches. However, both volatile and nonvolatile memories present unique reliability challenges. NVMs are immune to radiation-induced soft errors, however, due to their limited write endurance, they are vulnerable to hard errors under nonuniform write distribution. By contrast, SRAM has high write endurance but is susceptible to soft errors due to cosmic radiation. SRAM–NVM hybrid caches and the management techniques for them aim to bring the best of SRAM and NVM together, however, the reliability implications of them have not been well understood. In this paper, we show that there are inherent tradeoffs in improving resilience to hard and soft errors in hybrid caches such that mitigating one may result in aggravating another. We confirm this by experiments with two recent hybrid cache management techniques. We also re-examine cache design trends in modern processors from reliability perspective. This paper provides valuable insights to system developers for making reliability-aware design decisions.

[1]  Balaram Sinharoy,et al.  POWER7™, a Highly Parallel, Scalable Multi-Core High End Server Processor , 2011, IEEE Journal of Solid-State Circuits.

[2]  Sparsh Mittal,et al.  A Survey of Techniques for Managing and Leveraging Caches in GPUs , 2014, J. Circuits Syst. Comput..

[3]  Jeffrey S. Vetter,et al.  A Survey Of Techniques for Architecting DRAM Caches , 2016, IEEE Transactions on Parallel and Distributed Systems.

[4]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[5]  Jeffrey S. Vetter,et al.  A Survey of Techniques for Modeling and Improving Reliability of Computing Systems , 2016, IEEE Transactions on Parallel and Distributed Systems.

[6]  Mehdi Baradaran Tahoori,et al.  Evaluation of Hybrid Memory Technologies Using SOT-MRAM for On-Chip Cache Hierarchy , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Sparsh Mittal,et al.  A Survey of Architectural Techniques for Near-Threshold Computing , 2015, ACM J. Emerg. Technol. Comput. Syst..

[8]  Jeffrey S. Vetter,et al.  AYUSH: A Technique for Extending Lifetime of SRAM-NVM Hybrid Caches , 2015, IEEE Computer Architecture Letters.

[9]  Thomas A. Ziaja,et al.  Sparc T4: A Dynamically Threaded Server-on-a-Chip , 2012, IEEE Micro.

[10]  E. Normand Single-event effects in avionics , 1996 .

[11]  Victor V. Zyuban,et al.  IBM POWER7+ design for higher frequency at fixed power , 2013, IBM J. Res. Dev..

[12]  John R. Feehrer,et al.  The Oracle Sparc T5 16-Core Processor Scales to Eight Sockets , 2013, IEEE Micro.

[13]  Dong Li,et al.  A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches , 2015, IEEE Transactions on Parallel and Distributed Systems.

[14]  Jeffrey S. Vetter,et al.  EqualWrites: Reducing Intra-Set Write Variations for Enhancing Lifetime of Non-Volatile Caches , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Wei Chen,et al.  The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series , 2007, IEEE Journal of Solid-State Circuits.