Mitigating soft error failures for multimedia applications by selective data protection

With advances in process technology, soft errors(SE)are becoming an increasingly critical design concern. Due to their large area and high density, caches are worst hit by soft errors. Although Error Correction Code based mechanisms protect the data in caches, they have high performance and power overheads. Since multimedia applications are increasingly being used in mission-critical embedded systems where both reliability and energy are a major concern, there is a de?nite need to improve reliability in embedded systems, without too much energy overhead. We observe that while a soft error in multimedia data may only result in a minor loss in QoS, a soft error in avariable that controls the execution ?ow of the program may be fatal. Consequently, we propose to partition the data space into failure critical and failure non-critical data, and provide a high-degree of soft error protection only to the failure critical data in Horizontally Partitioned Caches. Experimental results demonstrate that our selective data protection can achieve the failure rate close to that of a soft error protected cache system, while retaining the performance and energy consumption similar to those of a traditional cache system, with some degradation in QoS. For example, for conventional con?guration as in IntelXScale, our approach achieves the same failure rate, while improving performance by 28% and reducing energy consumption by 29%in comparison with a soft error protected cache.

[1]  Fu-Chieh Hsu,et al.  The ideal SoC memory: 1T-SRAM/sup TM/ , 2000, Proceedings of 13th Annual IEEE International ASIC/SOC Conference (Cat. No.00TH8541).

[2]  Gary S. Tyson,et al.  A modified approach to data cache management , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[3]  Soontae Kim Area-Efficient Error Protection for Caches , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[4]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[5]  O. Musseau Single-event effects in SOI technologies and devices , 1996 .

[6]  Gary S. Tyson,et al.  Utilizing reuse information in data cache management , 1998, ICS '98.

[7]  Sudhakar M. Reddy,et al.  Cache size selection for performance, energy and reliability of time-constrained systems , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[8]  P. Hazucha,et al.  Impact of CMOS technology scaling on the atmospheric neutron soft error rate , 2000 .

[9]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[10]  Mahmut T. Kandemir,et al.  Soft error and energy consumption interactions: a data cache perspective , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[11]  Mahmut T. Kandemir,et al.  Partitioned instruction cache architecture for energy efficiency , 2003, TECS.

[12]  Aviral Shrivastava,et al.  Compilation techniques for energy reduction in horizontally partitioned cache architectures , 2005, CASES '05.

[13]  Gilles Gasiot,et al.  Comparisons of soft error rate for SRAMs in commercial SOI and bulk below the 130-nm technology node , 2003 .

[14]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[15]  Mateo Valero,et al.  A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality , 1995, International Conference on Supercomputing.

[16]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[17]  Robert Baumann,et al.  Soft errors in advanced computer systems , 2005, IEEE Design & Test of Computers.

[18]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[19]  Jin-Fu Li,et al.  An error detection and correction scheme for RAMs with partial-write function , 2005, 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT'05).

[20]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[21]  R. Hokinson,et al.  Historical trend in alpha-particle induced soft error rates of the Alpha/sup TM/ microprocessor , 2001, 2001 IEEE International Reliability Physics Symposium Proceedings. 39th Annual (Cat. No.00CH37167).

[22]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.

[23]  M. Calvet,et al.  Simulation of nucleon-induced nuclear reactions in a simplified SRAM structure: scaling effects on SEU and MBU cross sections , 2001 .

[24]  M. Baze,et al.  A digital CMOS design technique for SEU hardening , 2000 .

[25]  Luigi Carro,et al.  A multiple bit upset tolerant SRAM memory , 2003, TODE.