Evaluating processor-behavior and three error-detection mechanisms using physical fault-injection

An approach for assessing the impact of physical injection of transient faults on processor execution is described and evaluated. The fault injection is based on two complementary methods using: (1) heavy-ion radiation; and (2) power supply disturbances. 12000 transient faults were injected into the target microprocessor, a Motorola MC6809E 8-bit CPU, running 3 different workloads. In the evaluation, the control-flow errors were distinguished from those that had no effect on the correct flow of control. The errors that led to wrong results are separated from those that did not affect the correct results. The errors that affected neither the correct control flow nor the correct results are specified. Effects of errors on the registers and signals of the processor are characterized, Workload dependency on error rates is demonstrated. Three error-detection mechanisms, (2 software-based mechanisms and 1 watchdog timer) were combined and used to characterize the detected and undetected errors. More than 87% of all errors and 93% of the control-flow errors could be detected. In a different test, the efficiency of an isolated watchdog timer was evaluated. The coverage of the isolated watchdog timer was only 62%. The results indicate that fault-injection methods, workloads, and programming languages all differently affect the control flow, coverage, latency, and error rates. >

[1]  John Paul Shen,et al.  Processor Control Flow Monitoring Using Signatured Instruction Streams , 1987, IEEE Transactions on Computers.

[2]  Daniel P. Siewiorek,et al.  Effects of transient gate-level faults on program behavior , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[3]  Stephen S. Yau,et al.  Concurrent software fault detection , 1975, IEEE Transactions on Software Engineering.

[4]  M. Ball,et al.  Effects and detection of intermittent failures in digital systems , 1969, AFIPS '69 (Fall).

[5]  C. Preece,et al.  Erroneous execution and recovery in microprocessor systems , 1985, Softw. Microsystems.

[6]  Edward J. McCluskey,et al.  Control-flow checking using watchdog assists and extended-precision checksums , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[7]  Ravishankar K. Iyer,et al.  A Measurement-Based Model for Workload Dependence of CPU Errors , 1986, IEEE Transactions on Computers.

[8]  Johan Karlsson,et al.  Use of Heavy-Ion Radiation from 252Californium for Fault Injection Experiments , 1991 .

[9]  D.P. Siewiorek,et al.  A case study of C.mmp, Cm*, and C.vmp: Part I—Experiences with fault tolerance in multiprocessor systems , 1978, Proceedings of the IEEE.

[10]  John Paul Shen,et al.  Continuous signature monitoring: low-cost concurrent detection of processor control errors , 1990, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[11]  G.A.S. Wingate,et al.  Performance evaluation of a new design-tool for microprocessor transient fault recovery , 1989 .

[12]  Jan Torin,et al.  Effects of Physical Injection of Transient Faults on Control Flow and Evaluation of Some Software-Implemented Error Detection Techniques , 1995 .

[13]  Victor Carreño,et al.  A Fault Behavior Model for an Avionic Microprocessor: A Case Study , 1991 .

[14]  John Paul Shen,et al.  On-Line Self-Monitoring Using Signatured Instruction Streams , 1983, International Test Conference.

[15]  Henrique Madeira,et al.  On-Line Signature Learning and Checking , 1992 .

[16]  Ravishankar K. Iyer,et al.  FOCUS: An Experimental Environment for Fault Sensitivity Analysis , 1992, IEEE Trans. Computers.

[17]  Johan Karlsson,et al.  The Effects of Heavy-Ion Induced Single Event Upsets in the MC6809E Microprocessor , 1989, Fehlertolerierende Rechensysteme.

[18]  R. Koga,et al.  A method for characterizing a microprocessor's vulnerability to SEU , 1988 .

[19]  Larry L. Kinney,et al.  Concurrent Fault Detection in Microprogrammed Control Units , 1985, IEEE Transactions on Computers.