On the Hardness of Adding Nonmasking Fault Tolerance

This paper investigates the complexity of adding nonmasking fault tolerance, where a nonmasking fault-tolerant program guarantees recovery from states reached due to the occurrence of faults to states from where its specifications are satisfied. We first demonstrate that adding nonmasking fault tolerance to low atomicity programs-where processes have read/write restrictions with respect to the variables of other processes--is NP-complete (in the size of the state space) on an unfair or weakly fair scheduler. Then, we establish a surprising result that even under strong fairness, addition of nonmasking fault tolerance remains NP-hard! The NP-hardness of adding nonmasking fault tolerance is based on a polynomial-time reduction from the 3-SAT problem to the problem of designing self-stabilizing programs from their non-stabilizing versions, which is a special case of adding nonmasking fault tolerance. While it is known that designing self-stabilization under the assumption of strong fairness is polynomial, we demonstrate that adding self-stabilization to non-stabilizing programs is NP-hard under weak fairness.

[1]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[2]  Miguel Correia,et al.  Practical Hardening of Crash-Tolerant Systems , 2012, USENIX Annual Technical Conference.

[3]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[4]  Anish Arora A foundation of fault-tolerant computing , 1992 .

[5]  Boaz Patt-Shamir,et al.  Self-stabilization by local checking and correction , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[6]  Bowen Alpern,et al.  Defining Liveness , 1984, Inf. Process. Lett..

[7]  Fuad Abujarad,et al.  Automated constraint-based addition of nonmasking and stabilizing fault-tolerance , 2011, Theor. Comput. Sci..

[8]  Ali Ebnenasir,et al.  Complexity issues in automated synthesis of failsafe fault-tolerance , 2005, IEEE Transactions on Dependable and Secure Computing.

[9]  Mathai Joseph,et al.  Transformation of programs for fault-tolerance , 2005, Formal Aspects of Computing.

[10]  Ali Ebnenasir,et al.  A Lightweight Method for Automated Design of Convergence , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[11]  Murat Demirbas,et al.  Convergence refinement , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[12]  Ali Ebnenasir,et al.  The complexity of adding failsafe fault-tolerance , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[13]  Anish Arora,et al.  Component based design of fault-tolerance , 1999 .

[14]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[15]  Christof Fetzer,et al.  Software Encoded Processing: Building Dependable Systems with Commodity Hardware , 2007, SAFECOMP.

[16]  Anish Arora,et al.  Designing Masking Fault-Tolerance via Nonmasking Fault-Tolerance , 1998, IEEE Trans. Software Eng..

[17]  Borzoo Bonakdarpour,et al.  Revising Distributed UNITY Programs Is NP-Complete , 2008, OPODIS.

[18]  Ali Ebnenasir,et al.  Swarm Synthesis of Convergence for Symmetric Protocols , 2012, 2012 Ninth European Dependable Computing Conference.

[19]  Mohamed G. Gouda,et al.  The Triumph and Tribulation of System Stabilization , 1995, WDAG.

[20]  Edsger W. Dijkstra,et al.  Self-stabilizing systems in spite of distributed control , 1974, CACM.

[21]  Anish Arora,et al.  Automating the Addition of Fault-Tolerance , 2000, FTRTFT.

[22]  George Varghese,et al.  Constraint satisfaction as a basis for designing nonmasking fault-tolerance , 1996, J. High Speed Networks.

[23]  Anish Arora,et al.  Stabilization-Preserving Atomicity Refinement , 2002, J. Parallel Distributed Comput..

[24]  Mohamed G. Gouda The Theory of Weak Stabilization , 2001, WSS.

[25]  Anish Arora,et al.  Closure and Convergence: A Foundation of Fault-Tolerant Computing , 1993, IEEE Trans. Software Eng..