Complexity issues in automated synthesis of failsafe fault-tolerance

We focus on the problem of synthesizing failsafe fault-tolerance where fault-tolerance is added to an existing (fault-intolerant) program. A failsafe fault-tolerant program satisfies its specification (including safety and liveness) in the absence of faults. However, in the presence of faults, it satisfies its safety specification. We present a somewhat unexpected result that, in general, the problem of synthesizing failsafe fault-tolerant distributed programs from their fault-intolerant version is NP-complete in the state space of the program. We also identify a class of specifications, monotonic specifications, and a class of programs, monotonic programs, for which the synthesis of failsafe fault-tolerance can be done in polynomial time (in program state space). As an illustration, we show that the monotonicity restrictions are met for commonly encountered problems, such as Byzantine agreement, distributed consensus, and atomic commitment. Furthermore, we evaluate the role of these restrictions in the complexity of synthesizing failsafe fault-tolerance. Specifically, we prove that if only one of these conditions is satisfied, the synthesis of failsafe fault-tolerance is still NP-complete. Finally, we demonstrate the application of monotonicity property in enhancing the fault-tolerance of (distributed) nonmasking fault-tolerant programs to masking.

[1]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[2]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[3]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[4]  Bowen Alpern,et al.  Defining Liveness , 1984, Inf. Process. Lett..

[5]  Anish Arora,et al.  Closure and Convergence: A Foundation of Fault-Tolerant Computing , 1993, IEEE Trans. Software Eng..

[6]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[7]  Miroslaw Malek,et al.  The consensus problem in fault-tolerant computing , 1993, CSUR.

[8]  Mukesh Singhal,et al.  Advanced concepts in operating systems : distributed, database, and multiprocessor operating systems , 1993 .

[9]  P. Lincoln,et al.  Byzantine Agreement with Authentication : Observations andApplications in Tolerating Hybrid and Link Faults , 1995 .

[10]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[11]  Paul C. Attie,et al.  Synthesis of concurrent programs for an atomic read/write model of computation , 2001, TOPL.

[12]  Anish Arora,et al.  Compositional design of multitolerant repetitive byzantine agreement , 1997, WSS.

[13]  Arshad Jhumka,et al.  Automating the Addition of Fail-Safe Fault-Tolerance: Beyond Fusion-Closed Specifications , 2004, FORMATS/FTRTFT.

[14]  Anish Arora,et al.  Component based design of fault-tolerance , 1999 .

[15]  Anish Arora,et al.  Automating the Addition of Fault-Tolerance , 2000, FTRTFT.

[16]  Anish Arora,et al.  Polynomial time synthesis of Byzantine agreement , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[17]  Ali Ebnenasir,et al.  The complexity of adding failsafe fault-tolerance , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[18]  Mukesh Singhal,et al.  Advanced Concepts In Operating Systems , 1994 .

[19]  Rachid Guerraoui,et al.  Muteness Failure Detectors: Specification and Implementation , 1999, EDCC.

[20]  Anish Arora,et al.  Detectors and correctors: a theory of fault-tolerance components , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).