Influence of software on fault-tolerant microprocessor control system dependability

The fault tolerance of the control system in this paper is based on cold standby and software redundancy. The control system is a microprocessor modular system. It consists of standard nonredundant modules and specific additional modules. The fault-tolerant control system performs the validation of the active microprocessor unit and the actualization of the spare unit. In the case of the active unit failure, the control from the failed active unit is switched to the spare unit. The fault detection is based on a watchdog timer, checksums, data duplication, etc. The evaluation of the fault tolerance of this control system is performed by fault injection into the active unit bus lines performed by a fault injection system that records the tested system reaction. The control system is tested for nonredundant and redundant software. The influence of the watchdog period and actualization of standby unit period on the number of detected faults in the active and the standby unit is observed. This experimental method is therefore one of the ways to compare the quality of some fault-tolerant methods. The fault tolerance of the control system could be easily increased by using software redundancy and software methods.

[1]  Hubert D. Kirrmann Fault Tolerance in Process Control: An Overview And Examples of European Products , 1987, IEEE Micro.

[2]  Dhiraj K. Pradhan,et al.  Fault Injection: A Method for Validating Computer-System Dependability , 1995, Computer.

[3]  Fevzi Belli,et al.  An Approach to the Reliability Optimization of Software with Redundancy , 1991, IEEE Trans. Software Eng..

[4]  Edward J. McCluskey,et al.  Analysis of Checksums, Extended-Precision Checksums, and Cyclic Redundancy Checks , 1990, IEEE Trans. Computers.

[5]  Piotr Jędrzejowicz,et al.  Fault-tolerant programs and their reliability , 1990 .

[6]  Željko Hocenski Duplication of Microprocessor Systems because of Fault-Tolerance, , 1993 .

[7]  Željko Hocenski Evaluation of fault-tolerant microcomputer system by fault injection method , 1997 .

[8]  Dhiraj K. Pradhan,et al.  Fault-Tolerant Design Strategies for High Reliability and Safety , 1993, IEEE Trans. Computers.

[9]  Henrique Madeira,et al.  A watchdog processor for concurrent error detection in multiple processor systems , 1991, Microprocess. Microsystems.

[10]  Jean Arlat,et al.  Fault Injection and Dependability Evaluation of Fault-Tolerant Systems , 1993, IEEE Trans. Computers.

[11]  G. B. Finelli,et al.  The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software , 1993, IEEE Trans. Software Eng..

[12]  Kishor S. Trivedi,et al.  Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems , 1989, IEEE Trans. Computers.

[13]  Robert S. Swarz,et al.  The theory and practice of reliable system design , 1982 .

[14]  Jan Torin,et al.  Evaluating processor-behavior and three error-detection mechanisms using physical fault-injection , 1995 .

[15]  E. McCluskey,et al.  Calculation of Coverage Parameter , 1987, IEEE Transactions on Reliability.

[16]  L. Lehmann,et al.  Rollback Recovery in Multiprocessor Ring Configurations , 1987, Fehlertolerierende Rechensysteme.

[17]  Victor P. Nelson Fault-tolerant computing: fundamental concepts , 1990, Computer.

[18]  Henrique Madeira,et al.  Experimental evaluation of a set of simple error detection mechanisms , 1990 .