Application semantic driven assertions toward fault tolerant computing

Based on semantics of an application processing logic, we find out the most critical and sensitive parts of an application and we derive set of conditions or assertions among the various diagnostic checkpoint variables and we enhance the processing logic to enable it to detect run-time various operational or environmental faults toward fault tolerant computing. This paper examines how a single-version algorithm can establish software based fault tolerance by designing in thoughtful software based execution-time checks in a computing application. The algorithm developed here relies on various assertions that are derived from the semantics of an application. Various diagnostic assertive checkpoints have been derived based on an application's semantics. This work is not intended to correct bit-errors using conventional error correction codes. Errors have been detected through checkpoints and periodical execution of an application with known test data and verification of observed result with known result thereof. Electrical transients or small particles hitting the circuit, often cause random errors or faults in data and program flow. The manuscript describes an algorithm that allows the detection and recovery of transient or operational failures in software on a specific problem, just by using one version of a software program running on just one machine. This approach does not aim to tolerate software design bugs. This algorithmic approach uses various run-time signatures and validation thereof in order to detect faults.

[1]  G.K. Saha Transient fault-tolerance through algorithms , 2006, IEEE Potentials.

[2]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[3]  Fabrizio Lombardi,et al.  Scheduling policies for fault tolerance in a VLSI processor , 1994, IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems.

[4]  Brian Randell,et al.  Reliability Issues in Computing System Design , 1978, CSUR.

[5]  Henrique Madeira,et al.  Experimental evaluation of the fail-silent behaviour in programs with consistency checks , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[6]  Riccardo Bettati,et al.  Imprecise computations , 1994, Proc. IEEE.

[7]  Goutam Kumar Saha,et al.  Transient software fault tolerance using single-version algorithm , 2005, UBIQ.

[8]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[9]  Goutam Kumar Saha,et al.  Software-Based Fault Tolerant Computing , 2005, UBIQ.

[10]  Robert C. Spicer,et al.  Author's biography , 1993 .

[11]  Goutam Kumar Saha Software implemented fault tolerance through data error recovery , 2005, UBIQ.

[12]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.

[13]  Goutam Kumar Saha,et al.  A software fix towards fault-tolerant computing , 2005, UBIQ.

[14]  Goutam Kumar Saha Low-cost, fault-tolerance applications , 2005, IEEE Potentials.

[15]  Goutam Kumar Saha Fault tolerance in web services , 2006, UBIQ.

[16]  SahaGoutam Kumar A software fix towards fault-tolerant computing , 2005 .

[17]  G.K. Saha Software-based fault tolerant array , 2006, IEEE Potentials.

[18]  SahaGoutam Kumar Software-Based Fault Tolerant Computing , 2005 .

[19]  Geri Georg,et al.  An Aspect Oriented Approach to Analyzing Dependability Features , .

[20]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[21]  Goutam Kumar Saha Transient Fault Tolerance in Mobile Agent Based Computing , 2005 .

[22]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[23]  Andrew M. Tyrrell,et al.  Embryonics+immunotronics: a bio-inspired approach to fault tolerance , 2000, Proceedings. The Second NASA/DoD Workshop on Evolvable Hardware.

[24]  SahaGoutam Kumar Software implemented fault tolerance through data error recovery , 2005 .