On the characterization and optimization of system-level vulnerability for instruction caches in embedded processors

With continuous scaling down of the semiconductor technology, the soft errors induced by energetic particles have become an increasing challenge in designing current and next-generation reliable microprocessors. Due to their large share of the transistor budget and die area, cache memories suffer from an increasing vulnerability against soft errors. Previous work based on the vulnerability factor (VF) analysis proposed analytical models to evaluate the reliability of on-chip data and instruction caches. However, we have no possession of a system-level study on the vulnerability of instruction caches. In this paper, we propose a new analytical model to estimate the system-level vulnerability factor for on-chip instruction caches in embedded processors. In our model, the error masking/detection effects in instructions based on the Instruction Set Architecture (ISA) are studied. Our experimental results show that the self-error-masking/detection in instructions will reduce the VF of the instruction caches compared to the previous study. We also exemplify our design methodology by proposing several optimizing schemes to improve the reliability. Benchmarking is carried out to demonstrate the effectiveness of our vulnerability model and optimization approach, which can provide an insightful guidance for the future reliable instruction cache and ISA design.

[1]  James L. Walsh,et al.  IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..

[2]  Daniel J. Sorin,et al.  Choosing an Error Protection Scheme for a Microprocessor's L1 Data Cache , 2006, 2006 International Conference on Computer Design.

[3]  Shuai Wang,et al.  On the Characterization of Data Cache Vulnerability in High-Performance Embedded Microprocessors , 2006, 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[4]  Heinrich Theodor Vierhaus,et al.  Generating reliable embedded processors , 1998, IEEE Micro.

[5]  Mahmut T. Kandemir,et al.  Soft error and energy consumption interactions: a data cache perspective , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[6]  Wei Zhang,et al.  Computing cache vulnerability to transient errors and its implication , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[7]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[8]  Shuguang Feng,et al.  Cost-efficient soft error protection for embedded microprocessors , 2006, CASES '06.

[9]  Mehdi Baradaran Tahoori,et al.  Balancing Performance and Reliability in the Memory Hierarchy , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[10]  Shuai Wang,et al.  On the Characterization and Optimization of On-Chip Cache Reliability against Soft Errors , 2009, IEEE Transactions on Computers.

[11]  Amirali Baniasadi,et al.  System-Level Vulnerability Estimation for Data Caches , 2010, 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing.

[12]  Shuai Wang,et al.  TRB: Tag Replication Buffer for Enhancing the Reliability of the Cache Tag Array , 2010, 2010 IEEE Computer Society Annual Symposium on VLSI.

[13]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[14]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[15]  Hamid R. Zarandi,et al.  Cache vulnerability mitigation using an adaptive cache coherence protocol , 2014, The Journal of Supercomputing.

[16]  Shuai Wang,et al.  Self-Adaptive Data Caches for Soft-Error Reliability , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Arun K. Somani,et al.  Area efficient architectures for information integrity in cache memories , 1999, ISCA.

[18]  Opcode,et al.  Characterizing System-Level Vulnerability for Instruction Caches against Soft Errors , 2011 .

[19]  Shuai Wang,et al.  Characterizing soft error vulnerability of cache coherence protocols for chip-multiprocessors , 2014, 2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT).

[20]  Osman S. Unsal,et al.  Circuit design of a novel adaptable and reliable L1 data cache , 2013, GLSVLSI '13.

[21]  Wei Zhang,et al.  ICR: in-cache replication for enhancing data cache reliability , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[22]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[23]  Mehdi Baradaran Tahoori,et al.  Reducing Data Cache Susceptibility to Soft Errors , 2006, IEEE Transactions on Dependable and Secure Computing.

[24]  Jun Yan,et al.  Evaluating instruction cache vulnerability to transient errors , 2006, MEDEA '06.

[25]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[26]  Scott Mahlke,et al.  Efficient soft error protection for commodity embedded microprocessors using profile information , 2012, LCTES 2012.

[27]  Matthias Pflanz On-line Error Detection and Fast Recover Techniques for Dependable Embedded Processors , 2002, Lecture Notes in Computer Science.

[28]  Mahmut T. Kandemir,et al.  Soft errors issues in low-power caches , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29]  Mehdi Baradaran Tahoori,et al.  Vulnerability Analysis of L2 Cache Elements to Single Event Upsets , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[30]  Muhammad Shafique,et al.  ASER: Adaptive soft error resilience for Reliability-Heterogeneous Processors in the dark silicon era , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[31]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[32]  Aviral Shrivastava,et al.  Guidelines to design parity protected write-back L1 data cache , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[33]  Joel Emer,et al.  Computing Architectural Vulnerability Factors for Address-Based Structures , 2005, ISCA 2005.