Computing cache vulnerability to transient errors and its implication

Transient errors caused by particle strikes have become a critical challenge for microprocessor design. Being the major consumer of on-chip real estate, cache memories are particularly susceptible to transient errors. However, not all cache soft errors can be propagated to the processor. For instance, soft errors can be corrected by write operations before they are read. In this paper, we define the cache vulnerability factor (CVF) to be the probability that a fault in the cache can be propagated to the processor or other memory hierarchy. We also propose an approach to compute the CVF based on the cache line access patterns. Building upon the CVF we evaluate the reliability for different cache memories. Our results show that 83.5% of soft errors from a write-through data cache can be masked without affecting other components. We also propose two early write-back strategies to improve the reliability (i.e., by reducing the CVF) of write-back data caches without compromising the high performance.

[1]  Dhiraj K. Pradhan,et al.  Fault-tolerant computer system design , 1996 .

[2]  G. Tyson,et al.  Eager writeback-a technique for improving bandwidth utilization , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[3]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[4]  R. Baumann The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction , 2002, Digest. International Electron Devices Meeting,.

[5]  Arun K. Somani,et al.  Soft error sensitivity characterization for microprocessor dependability enhancement strategy , 2002, Proceedings International Conference on Dependable Systems and Networks.

[6]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[7]  Narayanan Vijaykrishnan,et al.  Analyzing soft errors in leakage optimized SRAM design , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[8]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.