An object-oriented approach for implementing algorithm-based fault tolerance

The authors demonstrate the practical use of an object-oriented system to incorporate fault tolerance and reliability into data objects. The object-based fault tolerance scheme uses abstraction to conceal algorithm-based fault tolerance layers. The scheme allows a layer of fault tolerance to be added to data objects without affecting or altering the use of the data objects. It is shown that the C++ class mechanisms of overloading and derivation permit the added fault tolerance to be transparent to the original data objects. To demonstrate the feasibility of this approach, using C++, a libray of matrix functions is presented and a layer of fault tolerance around matrix data objects is added. The weighted checksum code technique was implemented to create fault-tolerant matrix data objects. This allows programmers to add algorithm-based fault tolerance onto existing matrix applications without requiring modification to the original application. The implementation was experimentally evaluated using a software fault-injection tool, that emulated realistic hardware faults. An error coverage of over 96% was obtained with a memory overhead of 28%. The empirical results confirm the viability of the approach by demonstrating that object-based encapsulation is a valid method for transparently implementing algorithm-based fault tolerance.<<ETX>>

[1]  Victor P. Nelson Fault-tolerant computing: fundamental concepts , 1990, Computer.

[2]  Brian Randell System structure for software fault tolerance , 1975 .

[3]  Stanley B. Lippman,et al.  C++ Primer , 1993 .

[4]  Suku Nair,et al.  Real-Number Codes for Bault-Tolerant Matrix Operations On Processor Arrays , 1990, IEEE Trans. Computers.

[5]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[6]  Suku Nair,et al.  Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor , 1990, IEEE Trans. Computers.

[7]  J-C. Laprie,et al.  DEPENDABLE COMPUTING AND FAULT TOLERANCE : CONCEPTS AND TERMINOLOGY , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[8]  Jacob A. Abraham,et al.  FERRARI: a tool for the validation of system dependability properties , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[9]  J.A. Abraham,et al.  Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures , 1986, Proceedings of the IEEE.

[10]  Daniel P. Siewiorek,et al.  Fault Injection Experiments Using FIAT , 1990, IEEE Trans. Computers.

[11]  Chin-Long Chen,et al.  Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..

[12]  Dhiraj K. Pradhan,et al.  Error-Correcting Codes and Self-Checking Circuits , 1980, Computer.