论文信息 - Towards a Theory of Replicated Processing

Towards a Theory of Replicated Processing

In the N-Modular Redundancy (NMR) approach, a computation is made reliable by executing it on several computers, and determining its results by a decision algorithm. This paper investigates a formal approach to the use of NMR in replicated distributed systems, for which it introduces a notion of correctness based on consistency with their non-replicated counterpart, and a local correctness criterion. We discuss how a replicated system component may be implemented by N base copies, a majority of which is non-faulty. The formal approach sheds light on the necessity of coordinating the copies and on the requirements they should satisfy; in particular the difficulty of replicating synchronous communication is pointed out. A practical approach is also briefly examined and shown to be consistent with the formal model. Inside every replicated system there is a non-replicated system trying to get out.

Luigi V. Mancini | Giuseppe Pappalardo | L. Mancini | G. Pappalardo

[1] Fred B. Schneider,et al. Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[2] Luigi V. Mancini. Modular redundancy in a message passing system , 1986, IEEE Transactions on Software Engineering.

[3] Luigi V. Mancini,et al. Formal specification of N-modular redundancy , 1986, CSC '86.

[4] Luigi V. Mancini,et al. Proving Correctness Properties of a Replicated Synchronous Program , 1989, Comput. J..

[5] Leslie Lamport,et al. The Implementation of Reliable Distributed Multiprocess Systems , 1978, Comput. Networks.

[6] Richard S. Bird. The promotion and accumulation strategies in transformational programming , 1984, TOPL.

[7] P. M. Melliar-Smith,et al. Formal Specification and Mechanical Verification of SIFT: A Fault-Tolerant Flight Control System , 1982, IEEE Transactions on Computers.

[8] Leslie Lamport,et al. The Byzantine Generals Problem , 1982, TOPL.

[9] Robert E. Lyons,et al. The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..

[10] Algirdas Avizienis,et al. Fault Tolerance by Design Diversity: Concepts and Experiments , 1984, Computer.

[11] Jack Goldberg,et al. SIFT: A Provable Fault-Tolerant Computer for Aircraft Flight Control , 1980, IFIP Congress.

[12] L. Mancini,et al. The Join Algorithm: Ordering Messages in Replicated Systems , 1986 .

[13] Eric C. Cooper. Replicated distributed programs , 1985, SOSP '85.

[14] Luigi V. Mancini,et al. Synchronizing events in replicated systems , 1989, J. Syst. Softw..

[15] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[16] Fred B. Schneider,et al. Synchronization in Distributed Programs , 1982, TOPL.

[17] Santosh K. Shrivastava,et al. Exception Handling in Replicated Systems with Voting , 1986 .

[18] C. A. R. Hoare,et al. Communicating Sequential Processes (Reprint) , 1983, Commun. ACM.