论文信息 - Agreeing on who is present and who is absent in a synchronous distributed system

Agreeing on who is present and who is absent in a synchronous distributed system

The author describes his system model and failure assumptions by precisely specifying the processor group membership problem. He then gives two protocols for solving this problem. The protocols provide all correct processors with constituent views of the processor group membership. They also guarantee bounded processor failure detection and join processing delays despite any number of performance failures that do not cause network partitioning. The first protocol provides very fast processor failure detection but can require a significant message traffic overhead, even when no failures occur. To reduce this overhead, the author derives the second protocol, which has a (provable) minimal message overhead in the absence of failures but provides a longer failure detection delay and is more complex. He concludes by comparing his approach with other known approaches.<<ETX>>

Flaviu Cristian | F. Cristian

[1] LamportLeslie. Time, clocks, and the ordering of events in a distributed system , 1978 .

[2] Leslie Lamport,et al. Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. , 1984, TOPL.

[3] Kenneth P. Birman,et al. Reliable communication in the presence of failures , 1987, TOCS.

[4] Özalp Babaoglu,et al. Streets of Byzantium: Network Architectures for Fast Reliable Broadcasts , 1985, IEEE Transactions on Software Engineering.

[5] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[6] Jo-Mei Chang,et al. Reliable broadcast protocols , 1984, TOCS.

[7] Flaviu Cristian,et al. An efficient, fault-tolerant protocol for replicated data management , 1985, Fault-Tolerant Distributed Computing.