An experimental evaluation of the effectiveness of automatic rule-based transformations for safety-critical applications

Over the last years, an increasing number of safety-critical tasks have been demanded of computer systems. In particular, safety-critical computer-based applications are hitting markets where costs is a major issue, and thus solutions are required which conjugate fault tolerance with low costs. In this paper, a software-based approach for developing safety-critical applications is analyzed. By exploiting an ad-hoc tool implementing the proposed technique, several benchmark applications have been hardened against transient errors. Fault injection campaigns have been performed to evaluate the fault detection capability of the hardened applications. Moreover, a comparison of the proposed techniques with the Algorithm-Based Fault Tolerance (ABFT) approach is proposed. Experimental results show that the proposed approach is far more effective than ABFT in terms of fault detection capability when injecting transient faults in data and code memory, at a cost of an increased memory overhead. Moreover, the performance penalty introduced by the proposed technique is comparable, and sometimes lower, than that ABFT requires.

[1]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.

[2]  Suku Nair,et al.  Real-Number Codes for Bault-Tolerant Matrix Operations On Processor Arrays , 1990, IEEE Trans. Computers.

[3]  Marco Torchiano,et al.  Soft-error detection through software fault-tolerance techniques , 1999, Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99).

[4]  Michael Nicolaidis Time redundancy based soft-error tolerance to rescue nanometer technologies , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[5]  M. Nicolaidis,et al.  Cost reduction and evaluation of a temporary faults detecting technique , 2000, Proceedings Design, Automation and Test in Europe Conference and Exhibition 2000 (Cat. No. PR00537).

[6]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[7]  Brian Randell System structure for software fault tolerance , 1975 .

[8]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[9]  Lorena Anghel,et al.  Cost reduction and evaluation of temporary faults detecting technique , 2000, DATE '00.

[10]  Shantanu Dutt,et al.  Mantissa-Preserving Operations and Robust Algorithm-Based Fault Tolerance for Matrix Computations , 1996, IEEE Trans. Computers.

[11]  Niraj K. Jha,et al.  Algorithm-Based Fault Tolerance for FFT Networks , 1994, IEEE Trans. Computers.

[12]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[13]  Alfredo Benso,et al.  A C/C++ source-to-source compiler for dependable applications , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[14]  Henrique Madeira,et al.  Experimental evaluation of the fail-silent behaviour in programs with consistency checks , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[15]  A. Benso,et al.  An integrated HW and SW fault injection environment for real-time systems , 1998, Proceedings 1998 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (Cat. No.98EX223).

[16]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.