Enhancing data cache reliability by the addition of a small fully-associative replication cache

Soft error conscious cache design is a necessity for reliable computing. ECC or parity-based integrity checking technique in use today either compromises performance for reliability or vice versa, and the N modular redundancy (NMR) scheme is too costly for microprocessors and applications with stringent cost constraint. This paper proposes a novel and cost-effective solution to enhance data reliability with minimum impact on performance. The idea is to add a small fully-associative cache to store the replica(s) of every write to the L1 data cache. The replicas can be used to detect and correct soft errors. The replication cache can also be used to increase performance by reducing the L1 data cache miss rate. Our experiments show that more than 97% read hits of the L1 data cache can find replicas available in a replication cache of 8 blocks.

[1]  Hideki Imai Essentials of Error-Control Coding Techniques , 1990 .

[2]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[3]  Eiji Fujiwara,et al.  A Class of Error Control Codes for Byte Organized Memory Systems -SbEC-(Sb+S)ED Codes- , 1997, IEEE Trans. Computers.

[4]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[5]  Chin-Long Chen,et al.  Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..

[6]  Bella Bose,et al.  Burst asymmetric/unidirectional error correcting/detecting codes , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[7]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[8]  Wei Zhang,et al.  ICR: in-cache replication for enhancing data cache reliability , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[9]  Narayanan Vijaykrishnan,et al.  Analyzing soft errors in leakage optimized SRAM design , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[10]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[11]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[12]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[13]  Arun K. Somani,et al.  Area efficient architectures for information integrity in cache memories , 1999, ISCA.

[14]  Dual use of superscalar datapath for transient-fault detection and recovery , 2001, MICRO.

[15]  Janusz Sosnowski,et al.  Transient fault tolerance in digital systems , 1994, IEEE Micro.

[16]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[17]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[18]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[19]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[20]  Understanding Soft and Firm Errors in Semiconductor Devices Questions and Answers , 2002 .

[21]  SosnowskiJanusz Transient Fault Tolerance in Digital Systems , 1994 .

[22]  Johan Karlsson,et al.  Using heavy-ion radiation to validate fault-handling mechanisms , 1994, IEEE Micro.

[23]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .