Low cost consensus-based Atomic Broadcast

Atomic Broadcast (all processes deliver the same set of messages in the same order) is a very powerful communication primitive when one is interested in building fault-tolerant distributed systems. Moreover, it has been shown that Atomic Broadcast and Consensus are equivalent problems in asynchronous distributed systems prone to process crash failures. Hence, several Consensus-based Atomic Broadcast protocols have been designed. This paper introduces a new and particularly efficient Consensus-based Atomic Broadcast protocol. The efficiency is obtained by limiting the use of the Consensus subroutine to the cases where asynchrony and crashes prevent processes from obtaining a simple agreement on the message delivery order. The protocol assumes n>2f (where n is the number of processes and f the maximum number of them that can crash). In the most favorable cases, it requires two communication steps for processes to determine a message batch. In the worst case it requires an additional Consensus execution. It is shown that, when n>3f, the protocol can be simplified. It then requires a single communication step in the most favorable cases. This exhibits an interesting tradeoff relating the cost of the protocol with the maximum number of process failures.

[1]  Achour Mostéfaoui,et al.  Solving Consensus Using Chandra-Toueg's Unreliable Failure Detectors: A General Quorum-Based Approach , 1999, DISC.

[2]  Michel Raynal,et al.  A simple and fast asynchronous consensus protocol based on a weak failure detector , 1999, Distributed Computing.

[3]  Paulo Veríssimo,et al.  Topology-Aware Algorithms for Large-Scale Communication , 1999, Advances in Distributed Systems.

[4]  Achour Mostéfaoui,et al.  The best of both worlds: A hybrid approach to solve consensus , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[5]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[6]  Michael Ben-Or,et al.  Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols , 1983, PODC '83.

[7]  David Powell,et al.  Group communication , 1996, CACM.

[8]  Achour Mostéfaoui,et al.  Fault-tolerant Total Order Multicast to asynchronous groups , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[9]  André Schiper,et al.  Optimistic Atomic Broadcast , 1998, DISC.

[10]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[11]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[12]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[13]  Marcos K. Aguilera,et al.  Failure Detection and Randomization: A Hybrid Approach to Solve Consensus , 1998, SIAM J. Comput..

[14]  Michael Ben-Or,et al.  Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols , 1983, PODC '83.

[15]  Michel Raynal,et al.  Restricted failure detectors: Definition and reduction protocols , 1999, Inf. Process. Lett..

[16]  Paul D. Ezhilchelvan,et al.  Randomized multivalued consensus , 2001, Fourth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing. ISORC 2001.