Design, verification, and validation of self-checking software components

We propose a formal approach for adding fault detection to software. An assertion-based formalism is used to represent specifications and verify completeness and consistency. This specification is used to generate a flow graph, which is used to construct an exemplar-path tree. This representation is then used to generate an input set to exercise and verify the implementation. Previous software fault-tolerance (SFT) techniques emphasized algorithm-based fault tolerance (ABFT) which focused on detecting hardware faults that corrupted data structure contents. We propose a method that also detects hardware faults, which cause program flow errors. Our technique embeds two types of software checks. The first is based on the ABFT techniques described in the literature. The second type of check is used to detect faults that cause program flow errors. The exemplar-path tree provides information that can be used to predict a future program location, given the current location. During execution, program locations are recorded, along with expected locations, as determined from the exemplar-path tree. This information then is used to verify, that the future location is executed as expected. Hardware fault coverage has been estimated through experiments with the fault injection tool, SOFIT. Faults of differing durations were injected into memory, address bus, data bus, and CPU registers. The data presented, demonstrate the effectiveness of the method for detecting hardware faults.

[1]  Amber Roy-Chowdhury,et al.  Algorithm-based fault location and recovery for matrix computations , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[2]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[3]  Nancy M. Amato,et al.  Checking linked data structures , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[4]  Mordechai Ben-Ari,et al.  The temporal logic of branching time , 1981, POPL '81.

[5]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[6]  Jean Arlat,et al.  Fault injection for the formal testing of fault tolerance , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[7]  Nancy G. Leveson,et al.  The Use of Self Checks and Voting in Software Error Detection: An Empirical Study , 1990, IEEE Trans. Software Eng..

[8]  James P. Black,et al.  A Locally Correctable B-Tree Implementation , 1986, Comput. J..

[9]  Suku Nair,et al.  Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor , 1990, IEEE Trans. Computers.

[10]  Aniello Cimitile,et al.  Reverse engineering: Algorithms for program graph production , 1991, Softw. Pract. Exp..

[11]  J.L. Gersting,et al.  A comparison of voting algorithms for n-version programming , 1991, Proceedings of the Twenty-Fourth Annual Hawaii International Conference on System Sciences.

[12]  S. S. Ravi,et al.  Construction of Check Sets for Algorithm-Based Fault Tolerance , 1994, IEEE Trans. Computers.

[13]  Jacob A. Abraham,et al.  A Modular Robust Binary Tree , 1995 .

[14]  David F. McAllister,et al.  An Experimental Evaluation of Software Redundancy as a Strategy For Improving Reliability , 1991, IEEE Trans. Software Eng..

[15]  Edward J. McCluskey,et al.  Linear Complexity Assertions for Sorting , 1994, IEEE Trans. Software Eng..

[16]  P. J. Traverse,et al.  Safe and Reliable Computing on Board the Airbus and ATR Aircraft , 1986 .

[17]  Dimiter R. Avresky,et al.  A MULTI-STAGED SOFTWARE DESIGN APPROACH FOR FAULT TOLERANCE , 1994 .

[18]  Aniello Cimitile,et al.  Complexity in program schemes: the characteristic polynomial , 1983, SIGP.

[19]  Farokh B. Bastani,et al.  Systematic incorporation of efficient fault tolerance in systems of cooperating parallel programs , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[20]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[21]  David S. Rosenblum A Practical Approach to Programming With Assertions , 1995, IEEE Trans. Software Eng..

[22]  James M. Purtilo,et al.  An Environment for Developing Fault-Tolerant Software , 1991, IEEE Trans. Software Eng..

[23]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[24]  Krishna Kant,et al.  Synthesizing Robust Data STructures - An Introduction , 1990, IEEE Trans. Computers.

[25]  Antonia Bertolino,et al.  Automatic Generation of Path Covers Based on the Control Flow Analysis of Computer Programs , 1994, IEEE Trans. Software Eng..