Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection

This paper evaluates the concurrent error detection capabilities of system-level checks, using fault and error injection. The checks comprise application and system level mechanisms to detect control flow errors. We propose Enhanced Control-Flow Checking Using Assertions (ECCA). In ECCA, branch-free intervals (BFI) in a given high or intermediate level program are identified and the entry and exit points of the intervals are determined. BFls are then grouped into blocks, the size of which is determined through a performance/overhead analysis. The blocks are then fortified with preinserted assertions. For the high level ECCA, we describe an implementation of ECCA through a preprocessor that will automatically insert the necessary assertions into the program. Then, we describe the intermediate implementation possible through modifications made on gee to make it ECCA capable. The fault detection capabilities of the checks are evaluated both analytically and experimentally. Fault injection experiments are conducted using FERRARI to determine the fault coverage of the proposed techniques.

[1]  Jacob A. Abraham,et al.  FERRARI: A Flexible Software-Based Fault and Error Injection System , 1995, IEEE Trans. Computers.

[2]  Kien A. Hua,et al.  Design of Systems with Concurrent Error Detection Using Software Redundancy , 1986, FJCC.

[3]  V.S.S. Nair,et al.  Algorithm-based fault tolerance for noncomputationally intensive applications , 1994, Optics & Photonics.

[4]  Michael R. Lyu Software Fault Tolerance , 1995 .

[5]  David J. Lu Watchdog Processors and Structural Integrity Checking , 1982, IEEE Transactions on Computers.

[6]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[7]  Jacob A. Abraham,et al.  Evaluation of integrated system-level checks for on-line error detection , 1996, Proceedings of IEEE International Computer Performance and Dependability Symposium.

[8]  Jacob A. Abraham,et al.  Design and evaluation of automated checks for signal processing applications , 1996, Optics & Photonics.

[9]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[10]  John Paul Shen,et al.  Continuous signature monitoring: low-cost concurrent detection of processor control errors , 1990, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[11]  Masood Namjoo,et al.  Techniques for Concurrent Testing of VLSI Processor Operation , 1982, ITC.

[12]  Edward J. McCluskey,et al.  Control-Flow Checking Using Watchdog Assists and Extended-Precision Checksums , 1990, IEEE Trans. Computers.

[13]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.