Using asynchrony and zero degradation to speed up indulgent consensus protocols

Existing consensus protocols suffer from slowdowns caused by the failures of processes and the mistakes made by the underlying oracles. In this paper, we propose two novel techniques to circumvent such slowdowns in failure-detector-based consensus protocols. The first technique guarantees the Round-Zero-Degradation (RZD) property (an extension of the Zero-Degradation property) in order to avoid the slowdown caused by a failed coordinator process. The second technique, named ''Look-Ahead'', helps speed up the execution of the consensus protocol by making use of the messages delivered before their receivers enter the corresponding phases or rounds. The first technique is effective only when the underlying failure detector makes no or few mistakes, while the second technique always works well regardless of the performance of the failure detector. Moreover, Look-Ahead is a general technique and can be applied to consensus protocols based on any kind of oracle. By applying the two proposed techniques, several consensus protocols are developed. The simulation results show that the RZD technique is effective even if the error rate of the failure detector reaches about 15%, while the Look-Ahead technique can always improve the performance in all cases.

[1]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[2]  Rachid Guerraoui,et al.  Indulgent algorithms (preliminary version) , 2000, PODC '00.

[3]  Mikel Larrea,et al.  On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems , 2004, IEEE Trans. Computers.

[4]  Charles E. Perkins,et al.  Highly Dynamic Destination-Sequenced Distance-Vector Routing (DSDV) for mobile computers , 1994, SIGCOMM.

[5]  Nancy A. Lynch,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[6]  Achour Mostéfaoui,et al.  A Versatile Family of Consensus Protocols Based on Chandra-Toueg's Unreliable Failure Detectors , 2002, IEEE Trans. Computers.

[7]  Michael Ben-Or,et al.  Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols , 1983, PODC '83.

[8]  Rachid Guerraoui,et al.  The Generic Consensus Service , 2001, IEEE Trans. Software Eng..

[9]  Rachid Guerraoui,et al.  Fast Indulgent Consensus with Zero Degradation , 2002, EDCC.

[10]  Xavier Défago,et al.  Semi-passive replication and Lazy Consensus , 2004, J. Parallel Distributed Comput..

[11]  Rachid Guerraoui,et al.  Deconstructing paxos , 2003, SIGA.

[12]  Gerard Tel,et al.  Introduction to Distributed Algorithms: Contents , 2000 .

[13]  André Schiper Early consensus in an asynchronous system with a weak failure detector , 1997, Distributed Computing.

[14]  Charles E. Perkins,et al.  Ad-hoc on-demand distance vector routing , 1999, Proceedings WMCSA'99. Second IEEE Workshop on Mobile Computing Systems and Applications.

[15]  Achour Mostéfaoui,et al.  Leader-Based Consensus , 2001, Parallel Process. Lett..

[16]  Rachid Guerraoui,et al.  The information structure of indulgent consensus , 2004, IEEE Transactions on Computers.

[17]  Michael Ben-Or,et al.  Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols , 1983, PODC '83.

[18]  David A. Maltz,et al.  Dynamic Source Routing in Ad Hoc Wireless Networks , 1994, Mobidata.

[19]  Achour Mostéfaoui,et al.  A versatile and modular consensus protocol , 2002, Proceedings International Conference on Dependable Systems and Networks.

[20]  Francisco Vilar Brasileiro,et al.  Adaptive indulgent consensus , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[21]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1996, JACM.

[22]  Hagit Attiya,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 1998 .

[23]  Rachid Guerraoui,et al.  The overhead of consensus failure recovery , 2007, Distributed Computing.

[24]  Michel Raynal,et al.  A simple and fast asynchronous consensus protocol based on a weak failure detector , 1999, Distributed Computing.

[25]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[26]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[27]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[28]  Roy Friedman,et al.  Failure detectors in omission failure environments , 1997, PODC '97.