Crash-resilient time-free eventual leadership

Leader-based protocols rest on a primitive able to provide the processes with the same unique leader. Such protocols are very common in distributed computing to solve synchronization or coordination problems. Unfortunately, providing such a primitive is far from being trivial in asynchronous distributed systems prone to process crashes. (It is even impossible in fault-prone purely asynchronous systems.) To circumvent this difficulty, several protocols have been proposed that build a leader facility on top of an asynchronous distributed system enriched with synchrony assumptions. This paper consider another approach to build a leader facility, namely, it considers a behavioral property on the flow of messages that are exchanged. This property has the noteworthy feature not to involve timing assumptions. Two protocols based on this time-free property that implement a leader primitive are described. The first one uses potentially unbounded counters, while the second one (which is a little more involved) requires only finite memory. These protocols rely on simple design principles that make them attractive, easy to understand and provably correct.

[1]  Achour Mostéfaoui,et al.  A hybrid approach for building eventually accurate failure detectors , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[2]  Francis C. Chu Reducing &Ω to ◊ W , 1998 .

[3]  Mikel Larrea,et al.  Optimal implementation of the weakest failure detector for solving consensus , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[4]  Michel Raynal,et al.  An adaptive failure detection protocol , 2001, Proceedings 2001 Pacific Rim International Symposium on Dependable Computing.

[5]  Achour Mostéfaoui,et al.  A Versatile Family of Consensus Protocols Based on Chandra-Toueg's Unreliable Failure Detectors , 2002, IEEE Trans. Computers.

[6]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[7]  Achour Mostéfaoui,et al.  A necessary and sufficient condition for transforming limited accuracy failure detectors , 2004, J. Comput. Syst. Sci..

[8]  David Powell,et al.  Failure mode assumptions and assumption coverage , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[9]  Marcos K. Aguilera,et al.  On implementing omega with weak reliability and synchrony assumptions , 2003, PODC '03.

[10]  Achour Mostéfaoui,et al.  Asynchronous implementation of failure detectors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[11]  Marcos K. Aguilera,et al.  Communication-efficient leader election and consensus with limited link synchrony , 2004, PODC '04.

[12]  Achour Mostéfaoui,et al.  An introduction to oracles for asynchronous distributed systems , 2002, Future Gener. Comput. Syst..

[13]  Achour Mostéfaoui,et al.  Leader-Based Consensus , 2001, Parallel Process. Lett..

[14]  Achour Mostéfaoui,et al.  Low cost consensus-based Atomic Broadcast , 2000, Proceedings. 2000 Pacific Rim International Symposium on Dependable Computing.

[15]  Michel Raynal,et al.  Group membership failure detection: a simple protocol and its probabilistic analysis , 1999, Distributed Syst. Eng..

[16]  Rachid Guerraoui,et al.  Indulgent algorithms (preliminary version) , 2000, PODC '00.

[17]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[18]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[19]  Rachid Guerraoui,et al.  The information structure of indulgent consensus , 2004, IEEE Transactions on Computers.

[20]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[21]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[22]  Achour Mostéfaoui,et al.  Solving Consensus Using Chandra-Toueg's Unreliable Failure Detectors: A General Quorum-Based Approach , 1999, DISC.