Designing Masking Fault-Tolerance via Nonmasking Fault-Tolerance

Masking fault-tolerance guarantees that programs continually satisfy their specification in the presence of faults. By way of contrast, nonmasking fault-tolerance does not guarantee as much: it merely guarantees that when faults stop occurring, program executions converge to states from where programs continually (re)satisfy their specification. We present in this paper a component based method for the design of masking fault-tolerant programs. In this method, components are added to a fault-intolerant program in a stepwise manner, first, to transform the fault-intolerant program into a nonmasking fault-tolerant one and, then, to enhance the fault-tolerance from nonmasking to masking. We illustrate the method by designing programs for agreement in the presence of Byzantine faults, data transfer in the presence of message loss, triple modular redundancy in the presence of input corruption, and mutual exclusion in the presence of process fail-stops. These examples also serve to demonstrate that the method accommodates a variety of fault-classes. It provides alternative designs for programs usually designed with extant design methods, and it offers the potential for improved masking fault-tolerant programs.

[1]  Brian Randell System structure for software fault tolerance , 1975 .

[2]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[3]  David Gries,et al.  The Science of Programming , 1981, Text and Monographs in Computer Science.

[4]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[5]  Bowen Alpern,et al.  Defining Liveness , 1984, Inf. Process. Lett..

[6]  Bowen Alpern,et al.  Proving Boolean Combinations of Deterministic Properties , 1987, Logic in Computer Science.

[7]  Farokh B. Bastani,et al.  A Class of Inherently Fault Tolerant Distributed Programs , 1988, IEEE Trans. Software Eng..

[8]  K. Mani Chandy,et al.  Parallel program design - a foundation , 1988 .

[9]  Kerry Raymond,et al.  A tree-based algorithm for distributed mutual exclusion , 1989, TOCS.

[10]  Mukesh Singhal,et al.  A fault tolerant algorithm for distributed mutual exclusion , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[11]  Edsger W. Dijkstra,et al.  Predicate Calculus and Program Semantics , 1989, Texts and Monographs in Computer Science.

[12]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[13]  Divyakant Agrawal,et al.  An efficient and fault-tolerant solution for distributed mutual exclusion , 1991, TOCS.

[14]  Anish Arora,et al.  Closure and Convergence: A Foundation of Fault-Tolerant Computing , 1993, IEEE Trans. Software Eng..

[15]  Doron A. Peled,et al.  A Compositional Framework for Fault Tolerance by Specification Transformation , 1994, Theor. Comput. Sci..

[16]  Dhananjay M. Dhamdhere,et al.  A Token Based k-Resilient Mutual Exclusion Algorithm for Distributed Systems , 1994, Inf. Process. Lett..

[17]  Anish Arora Efficient Reconfiguration of Trees: A Case Study in Methodical Design of Nonmasking Fault-Tolerant Programs , 1994, FTRTFT.

[18]  J-C. Laprie,et al.  DEPENDABLE COMPUTING AND FAULT TOLERANCE : CONCEPTS AND TERMINOLOGY , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[19]  Anish Arora,et al.  Designing masking fault-tolerance via nonmasking fault-tolerance , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[20]  George Varghese,et al.  Constraint satisfaction as a basis for designing nonmasking fault-tolerance , 1996, J. High Speed Networks.

[21]  Anish Arora,et al.  Multitolerant Barrier Synchronization , 1997, Inf. Process. Lett..

[22]  Anish Arora,et al.  Compositional design of multitolerant repetitive byzantine agreement , 1997, WSS.

[23]  Anish Arora,et al.  Multitolerance in Distributed Reset , 1998, Chic. J. Theor. Comput. Sci..

[24]  Anish Arora,et al.  Component Based Design of Multitolerant Systems , 1998, IEEE Trans. Software Eng..