Evaluation of integrated system-level checks for on-line error detection

This paper evaluates the capabilities of an integrated system level error detection technique using fault and error injection. This technique is comprised of two software level mechanisms for concurrent error detection, control flow checking using assertions (CCA) and data error checking using application specific data checks. Over 300,000 faults and errors were injected and the analysis of the results reveals that the CCA detects 95% of all the errors while the data checks are able to detect subtle errors that go undetected by the CCA technique. Latency measurements also shelved that the CCA technique is faster than the data checks in detecting the error. When both techniques were incorporated, the system was able to detect over 98% of all injected errors.

[1]  Edward J. McCluskey,et al.  Control-Flow Checking Using Watchdog Assists and Extended-Precision Checksums , 1990, IEEE Trans. Computers.

[2]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[3]  Kien A. Hua,et al.  DESIGN AND EVALUATION OF EXECUTABLE ASSERTIONS FOR CONCURRENT ERROR DETECTION. , 1987 .

[4]  Jacob A. Abraham,et al.  FERRARI: A Flexible Software-Based Fault and Error Injection System , 1995, IEEE Trans. Computers.

[5]  Suku Nair,et al.  Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor , 1990, IEEE Trans. Computers.

[6]  Kien A. Hua,et al.  Design of Systems with Concurrent Error Detection Using Software Redundancy , 1986, FJCC.

[7]  Johan Karlsson,et al.  Evaluation of error detection schemes using fault injection by heavy-ion radiation , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[8]  Lori A. Clarke,et al.  A Formal Model of Program Dependences and Its Implications for Software Testing, Debugging, and Maintenance , 1990, IEEE Trans. Software Eng..

[9]  Johan Karlsson,et al.  Two software techniques for on-line error detection , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[10]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.

[11]  Stephen S. Yau,et al.  Concurrent software fault detection , 1975, IEEE Transactions on Software Engineering.

[12]  David J. Lu Watchdog Processors and Structural Integrity Checking , 1982, IEEE Transactions on Computers.

[13]  John Paul Shen A roving monitoring processor for detection of control flow errors in multiple processor systems , 1987 .

[14]  John Paul Shen,et al.  Continuous signature monitoring: low-cost concurrent detection of processor control errors , 1990, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[15]  Jacob A. Abraham,et al.  Control Flow Checking in Object-Based Distributed Systems , 1993 .

[16]  Wen-mei W. Hwu,et al.  A software based approach to achieving optimal performance for signature control flow checking , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.