Using loop invariants to fight soft errors in data caches

Ever scaling process technology makes embedded systems more vulnerable to soft errors than in the past. One of the generic methods used to fight soft errors is based on duplicating instructions either in the spatial or temporal domain and then comparing the results to see whether they are different. This full duplication based scheme, though effective, is very expensive in terms of performance, power, and memory space. In this paper, we propose an alternate scheme based on loop invariants and present experimental results which show that our approach catches 62% of the errors caught by full duplication, when averaged over all benchmarks tested. In addition, it reduces the execution cycles and memory demand of the full duplication strategy by 80% and 4%, respectively.

[1]  Irith Pomeranz,et al.  Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[2]  Alfred V. Aho,et al.  Principles of Compiler Design , 1977 .

[3]  Henrique Madeira,et al.  Experimental evaluation of the fail-silent behaviour in programs with consistency checks , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[4]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[5]  Bogdan Nicolescu,et al.  Detecting Soft Errors by a Purely Software Approach: Method, Tools and Experimental Results , 2003, DATE.

[6]  Alfred V. Aho,et al.  Principles of Compiler Design (Addison-Wesley series in computer science and information processing) , 1977 .

[7]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[8]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[9]  Marco Torchiano,et al.  Soft-error detection through software fault-tolerance techniques , 1999, Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99).

[10]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[11]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.