In this paper we focus on automated techniques to enhance the fault-tolerance of a nonmasking fault-tolerant program to masking. A masking program continually satisfies its specification even if faults occur. By contrast, a nonmasking program merely guarantees that after faults stop occurring, the program recovers to states from where it continually satisfies its specification. Until the recovery is complete, however a nonmasking program can violate its (safety) specification. Thus, the problem of enhancing fault-tolerance from nonmasking to masking requires that safety be added and recovery be preserved. We focus on this enhancement problem for high atomicity programs-where each process can read all variables-and for distributed programs-where restrictions are imposed on what processes can read and write. We present a sound and complete algorithm for high atomicity programs and a sound algorithm for distributed programs. We also argue that our algorithms are simpler than previous algorithms, where masking fault-tolerance is added to a fault-intolerant program. Hence, these algorithms can partially reap the benefits of automation when the cost of adding masking fault-tolerance to a fault-intolerant program is high. To illustrate these algorithms, we show how the masking fault-tolerant programs for triple modular redundancy and Byzantine agreement can be obtained by enhancing the fault-tolerance of the corresponding nonmasking versions. We also discuss how the derivation of these programs is simplified when we begin with a nonmasking fault-tolerant program.
[1]
Paul C. Attie,et al.
Synthesis of concurrent systems for an atomic read/atomic write model of computation
,
1996,
PODC '96.
[2]
Ali Ebnenasir,et al.
The complexity of adding failsafe fault-tolerance
,
2002,
Proceedings 22nd International Conference on Distributed Computing Systems.
[3]
Bowen Alpern,et al.
Defining Liveness
,
1984,
Inf. Process. Lett..
[4]
Anish Arora,et al.
Synthesis of fault-tolerant concurrent programs
,
2004,
TOPL.
[5]
Anish Arora,et al.
Polynomial time synthesis of Byzantine agreement
,
2001,
Proceedings 20th IEEE Symposium on Reliable Distributed Systems.
[6]
Anish Arora,et al.
Component based design of fault-tolerance
,
1999
.
[7]
Anish Arora,et al.
Automating the Addition of Fault-Tolerance
,
2000,
FTRTFT.
[8]
Anish Arora,et al.
Designing Masking Fault-Tolerance via Nonmasking Fault-Tolerance
,
1998,
IEEE Trans. Software Eng..
[9]
Edmund M. Clarke,et al.
Using Branching Time Temporal Logic to Synthesize Synchronization Skeletons
,
1982,
Sci. Comput. Program..
[10]
Leslie Lamport,et al.
The Byzantine Generals Problem
,
1982,
TOPL.
[11]
Anish Arora,et al.
Closure and Convergence: A Foundation of Fault-Tolerant Computing
,
1993,
IEEE Trans. Software Eng..