Software Based Fault Tolerance against Byzantine Failures

The proposed software technique is a very low cost and an effective solution towards designing Byzantine fault tolerant computing application systems that are not so safety critical. It does not rely on multiple versions of software running simultaneously on multiple machines. The proposed software approach is to mask various hardware random errors on adopting the so-called, ESVP (an enhanced single -version program) scheme, while an application is being executed. It is not intended to eliminate software design bugs. In other words, it is assumed that code is correct and the faulty behavior is only due to transient or Byzantine faults affecting an application system. Implementation of this approach is also easy. A test program's present state is compared with its pre-computed state also in order to detect state transition - fault also. ESVP is intended to be suitable for a computer- based process monitoring system.

[1]  Ravishankar K. Iyer,et al.  Chameleon: A Software Infrastructure for Adaptive Fault Tolerance , 1999, IEEE Trans. Parallel Distributed Syst..

[2]  A. Benso,et al.  An integrated HW and SW fault injection environment for real-time systems , 1998, Proceedings 1998 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (Cat. No.98EX223).

[3]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[4]  Goutam Kumar Saha Transient software fault tolerance through recovery , 2003, UBIQ.

[5]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.

[6]  阿部晋树 Fault tolerant computer system , 2005 .

[7]  Felix C. Freiling,et al.  Byzantine Failures and Security: Arbitrary is not (always) Random , 2003, GI Jahrestagung.

[8]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[9]  Goutam Kumar Saha,et al.  Software fault avoidance issues , 2006, UBIQ.

[10]  Felix C. Gaertner Byzantine Failures and Security: Arbitrary is not (always) Random , 2003 .

[11]  Goutam Kumar Saha,et al.  Software Based Fault Tolerance – a Survey , 2006 .

[12]  Goutam Kumar Saha Software implemented fault tolerance through data error recovery , 2005, UBIQ.