Exploiting Same Tag Bits to Improve the Reliability of the Cache Memories

With the trend of increasing transient error rate, it is becoming important to prevent transient errors and provide a correction mechanism for hardware circuits, especially for SRAM cache memories. Caches are the largest structures in current microprocessors and, hence, are most vulnerable to the transient errors. Tag bits in cache memories are also exposed to transient errors but a few efforts have been made to reduce their vulnerability. In this paper, we propose to exploit prevalent same tag bits to improve error protection capability of the tag bits in the caches. When data are fetched from the main memory, it is checked if adjacent cache lines have the same tag bits as those of the data fetched. This same tag bit information is stored in the caches as extra bits to be used later. When an error is detected in the tag bits, the same tag bit information is used to recover from the error in the tag bits. The proposed scheme has small area, energy, and performance overheads with error protection coverage of 97.9% on average. Even with large working sets and various cache sizes, our scheme shows protection coverage of higher than 95% on average.

[1]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[2]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[3]  Joel S. Emer,et al.  THE SECOND AVOIDS DECLARING ERRORS ON BENIGN FAULTS . APPLYING THESE TECHNIQUES TO A MICROPROCESSOR INSTRUCTION QUEUE SIGNIFICANTLY REDUCES ITS ERROR RATE WITH ONLY MINOR PERFORMANCE DEGRADATION . REDUCING THE SOFT-ERROR RATE OF A HIGH-PERFORMANCE MICROPROCESSOR , 2005 .

[4]  Shuichi Sakai,et al.  Mitigating soft errors in highly associative cache with CAM-based tag , 2005, 2005 International Conference on Computer Design.

[5]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[6]  Norbert Wehn,et al.  XEEMU: An Improved XScale Power Simulator , 2007, PATMOS.

[7]  Babak Falsafi,et al.  Mitigating multi-bit soft errors in L1 caches using last-store prediction , 2007 .

[8]  Shuai Wang,et al.  Replicating Tag Entries for Reliability Enhancement in Cache Tag Arrays , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[10]  Wei Zhang,et al.  ICR: in-cache replication for enhancing data cache reliability , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[11]  A. Seznec,et al.  Decoupled sectored caches: conciliating low tag implementation cost and low miss ratio , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[12]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[13]  Wei Zhang,et al.  Replication cache: a small fully associative cache to improve data cache reliability , 2005, IEEE Transactions on Computers.

[14]  Soontae Kim Reducing Area Overhead for Error-Protecting Large L2/L3 Caches , 2009, IEEE Trans. Computers.

[15]  Mehdi Baradaran Tahoori,et al.  Vulnerability Analysis of L2 Cache Elements to Single Event Upsets , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[16]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[17]  N. Ranganathan,et al.  A Framework for Correction of Multi-Bit Soft Errors in L2 Caches Based on Redundancy , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  Arun K. Somani,et al.  Area efficient architectures for information integrity in cache memories , 1999, ISCA.

[19]  Nhon Quach,et al.  High Availability and Reliability in the Itanium Processor , 2000, IEEE Micro.

[20]  C.W. Slayman,et al.  Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations , 2005, IEEE Transactions on Device and Materials Reliability.

[21]  Shuai Wang,et al.  TRB: Tag Replication Buffer for Enhancing the Reliability of the Cache Tag Array , 2010, 2010 IEEE Computer Society Annual Symposium on VLSI.

[22]  Osman S. Unsal,et al.  Exploiting Narrow Values for Soft Error Tolerance , 2006, IEEE Computer Architecture Letters.

[23]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[24]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[25]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[26]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[27]  P.N. Sanda,et al.  IBM z990 soft error detection and recovery , 2005, IEEE Transactions on Device and Materials Reliability.

[28]  Sanjay J. Patel,et al.  ReStore: Symptom-Based Soft Error Detection in Microprocessors , 2006, IEEE Trans. Dependable Secur. Comput..

[29]  Gabriel H. Loh,et al.  Zesto: A cycle-level simulator for highly detailed microarchitecture exploration , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[30]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[31]  Luca Benini,et al.  Low power error resilient encoding for on-chip data buses , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[32]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..