A System-Level Framework for Analytical and Empirical Reliability Exploration of STT-MRAM Caches

Spin-transfer torque magnetic RAM (STT-MRAM) is known as the most promising replacement for static random access memory (SRAM) technology in large last-level cache memories (LLC). Despite its high density, nonvolatility, near-zero leakage power, and immunity to radiation as the major advantages, STT-MRAM-based cache memory suffers from high error rates mainly due to retention failure (RF), read disturbance, and write failure. Existing studies are limited to estimate the rate of only one or two of these error types for STT-MRAM cache. However, the overall vulnerability of STT-MRAM caches, whose estimation is a must to design cost-efficient reliable caches, has not been studied previously. In this paper, we propose a system-level framework for reliability exploration and characterization of errors’ behavior in STT-MRAM caches. To this end, we formulate the cache vulnerability considering the intercorrelation of the error types including RF, read disturbance, and write failure as well as the dependency of error rates to workloads’ behavior and process variations (PVs). Our analysis reveals that STT-MRAM cache vulnerability is highly workload-dependent and varies by orders of magnitude in different cache access patterns. Our analytical study also shows that this vulnerability divergence significantly increases by PVs in STT-MRAM cells. To take the effects of system workloads and PVs into account, we implement the error types in gem5 full-system simulator. The experimental results using a comprehensive set of multiprogrammed workloads from SPEC CPU2006 benchmark suite on a quad-core processor show that the total error rate in a shared STT-MRAM LLC varies by 32.0× for different workloads. A further 6.5× vulnerability variation is observed when considering PVs in the STT-MRAM cells. In addition, the contribution of each error type in total LLC vulnerability highly varies in different cache access patterns and moreover, error rates are differently affected by PVs. The proposed analytical and empirical studies can significantly help system architects for efficient utilization of error mitigation techniques and designing highly reliable and low-cost STT-MRAM LLCs.

[1]  Cong Xu,et al.  NVSim-VXs: An improved NVSim for variation aware STT-RAM simulation , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[2]  Jeong-Heon Park,et al.  Dependence of Voltage and Size on Write Error Rates in Spin-Transfer Torque Magnetic Random-Access Memory , 2016, IEEE Magnetics Letters.

[3]  Hui Zhao,et al.  Spin-Torque Driven Switching Probability Density Function Asymmetry , 2012, IEEE Transactions on Magnetics.

[4]  Hossein Asadi,et al.  ROBIN: incremental oblique interleaved ECC for reliability improvement in STT-MRAM caches , 2019, ASP-DAC.

[5]  Jacques-Olivier Klein,et al.  Design considerations and strategies for high-reliable STT-MRAM , 2011, Microelectron. Reliab..

[6]  Yiran Chen,et al.  A Novel Self-Reference Technique for STT-RAM Read and Write Reliability Enhancement , 2014, IEEE Transactions on Magnetics.

[7]  Dong Li,et al.  A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches , 2015, IEEE Transactions on Parallel and Distributed Systems.

[8]  Zheng Li,et al.  Variation-Tolerant and Disturbance-Free Sensing Circuit for Deep Nanometer STT-MRAM , 2014, IEEE Transactions on Nanotechnology.

[9]  Gi-Ho Park,et al.  NVM Way Allocation Scheme to Reduce NVM Writes for Hybrid Cache Architecture in Chip-Multiprocessors , 2017, IEEE Transactions on Parallel and Distributed Systems.

[10]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[11]  Xueti Tang,et al.  Spin-transfer torque magnetic random access memory (STT-MRAM) , 2013, JETC.

[12]  Kaushik Roy,et al.  Failure Mitigation Techniques for 1T-1MTJ Spin-Transfer Torque MRAM Bit-cells , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Nanning Zheng,et al.  Design techniques to improve the device write margin for MRAM-based cache memory , 2011, GLSVLSI '11.

[14]  Kiyoung Choi,et al.  Exploration of trade-offs in the design of volatile STT-RAM cache , 2016, J. Syst. Archit..

[15]  Jun Yang,et al.  Selective restore: An energy efficient read disturbance mitigation scheme for future STT-MRAM , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[16]  Nanning Zheng,et al.  Architectural Exploration to Enable Sufficient MTJ Device Write Margin for STT-RAM Based Cache , 2012, IEEE Transactions on Magnetics.

[17]  Hamed Farbeh,et al.  REACT: Read/Write Error Rate Aware Coding Technique for Emerging STT-MRAM Caches , 2019, IEEE Transactions on Magnetics.

[18]  Aida Todri,et al.  Temperature Impact Analysis and Access Reliability Enhancement for 1T1MTJ STT-RAM , 2016, IEEE Transactions on Reliability.

[19]  Seyed Ghassem Miremadi,et al.  AWARE: Adaptive Way Allocation for Reconfigurable ECCs to Protect Write Errors in STT-RAM Caches , 2019, IEEE Transactions on Emerging Topics in Computing.

[20]  Arijit Raychowdhury,et al.  A Model Study of Defects and Faults in Embedded Spin Transfer Torque (STT) MRAM Arrays , 2015, 2015 IEEE 24th Asian Test Symposium (ATS).

[21]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[22]  Seyed Ghassem Miremadi,et al.  Investigating the Effects of Process Variations and System Workloads on Reliability of STT-RAM Caches , 2016, 2016 12th European Dependable Computing Conference (EDCC).

[23]  Onur Mutlu,et al.  An Analytical Model for Performance and Lifetime Estimation of Hybrid DRAM-NVM Main Memories , 2019, IEEE Transactions on Computers.

[24]  Hossein Asadi,et al.  Dependability Analysis of Data Storage Systems in Presence of Soft Errors , 2019, IEEE Transactions on Reliability.

[25]  Youguang Zhang,et al.  Reconfigurable Codesign of STT-MRAM Under Process Variations in Deeply Scaled Technology , 2015, IEEE Transactions on Electron Devices.

[26]  Mircea R. Stan,et al.  Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[27]  Youguang Zhang,et al.  Read disturbance issue and design techniques for nanoscale STT-MRAM , 2016, J. Syst. Archit..

[28]  Reza Salkhordeh,et al.  ReCA: An Efficient Reconfigurable Cache Architecture for Storage Systems with Online Workload Characterization , 2018, IEEE Transactions on Parallel and Distributed Systems.

[29]  Mohamad Towfik Krounbi,et al.  Basic principles of STT-MRAM cell operation in memory arrays , 2013 .

[30]  Yu Hua,et al.  A Write-Friendly and Cache-Optimized Hashing Scheme for Non-Volatile Memory Systems , 2018, IEEE Transactions on Parallel and Distributed Systems.

[31]  T. Devolder,et al.  Self-Enabled “Error-Free” Switching Circuit for Spin Transfer Torque MRAM and Logic , 2012, IEEE Transactions on Magnetics.

[32]  Seong-Ook Jung,et al.  Read Disturbance Reduction Technique for Offset-Canceling Dual-Stage Sensing Circuits in Deep Submicrometer STT-RAM , 2016, IEEE Transactions on Circuits and Systems II: Express Briefs.

[33]  Seyed Ghassem Miremadi,et al.  Floating-ECC: Dynamic Repositioning of Error Correcting Code Bits for Extending the Lifetime of STT-RAM Caches , 2016, IEEE Transactions on Computers.

[34]  Youguang Zhang,et al.  Yield and Reliability Improvement Techniques for Emerging Nonvolatile STT-MRAM , 2015, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[35]  Hamed Farbeh,et al.  A-CACHE: Alternating Cache Allocation to Conduct Higher Endurance in NVM-Based Caches , 2019, IEEE Transactions on Circuits and Systems II: Express Briefs.

[36]  Michael Mascagni,et al.  The Impact of Soft Error Event Topography on the Reliability of Computer Memories , 2017, IEEE Transactions on Reliability.

[37]  Hai Li,et al.  Process variation aware data management for STT-RAM cache design , 2012, ISLPED '12.

[38]  Mehdi B. Tahoori,et al.  VAET-STT: Variation Aware STT-MRAM Analysis and Design Space Exploration Tool , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[39]  Paolo Prinetto,et al.  Challenges and Solutions in Emerging Memory Testing , 2019, IEEE Transactions on Emerging Topics in Computing.

[40]  Hamed Farbeh,et al.  Sleepy-LRU: extending the lifetime of non-volatile caches by reducing activity of age bits , 2019, The Journal of Supercomputing.

[41]  Seyed Ghassem Miremadi,et al.  TA-LRW: A Replacement Policy for Error Rate Reduction in STT-MRAM Caches , 2019, IEEE Transactions on Computers.

[42]  Kiyoung Choi,et al.  Selectively protecting error-correcting code for area-efficient and reliable STT-RAM caches , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[43]  Seyed Ghassem Miremadi,et al.  An Efficient Protection Technique for Last Level STT-RAM Caches in Multi-Core Processors , 2017, IEEE Transactions on Parallel and Distributed Systems.

[44]  Mohsen Imani,et al.  Approximate Computing Using Multiple-Access Single-Charge Associative Memory , 2018, IEEE Transactions on Emerging Topics in Computing.

[45]  Reza Salkhordeh,et al.  An Operating System level data migration scheme in hybrid DRAM-NVM memory architecture , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[46]  An Chen,et al.  A review of emerging non-volatile memory (NVM) technologies and applications , 2016 .

[47]  Olaf Spinczyk,et al.  Generic Soft-Error Detection and Correction for Concurrent Data Structures , 2017, IEEE Transactions on Dependable and Secure Computing.

[48]  Kaushik Roy,et al.  Yield, Area, and Energy Optimization in STT-MRAMs Using Failure-Aware ECC , 2015, ACM J. Emerg. Technol. Comput. Syst..

[49]  Rami G. Melhem,et al.  CAFO: Cost aware flip optimization for asymmetric memories , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[50]  Hossein Asadi,et al.  Enhancing Reliability of STT-MRAM Caches by Eliminating Read Disturbance Accumulation , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[51]  Sparsh Mittal A Survey of Soft-Error Mitigation Techniques for Non-Volatile Memories , 2017, Comput..

[52]  Puneet Gupta,et al.  MEMRES: A Fast Memory System Reliability Simulator , 2016, IEEE Transactions on Reliability.