From an Intermittent Rotating Star to a Leader

Considering an asynchronous system made up of n processes and where up to t of them can crash, finding the weakest assumptions that such a system has to satisfy for a common leader to be eventually elected is one of the holy grail quests of fault-tolerant asynchronous computing. This paper is a step in such a quest. It has two main contributions. First, it proposes an asynchronous system model, in which an eventual leader can be elected, that is weaker and more general than previous models. This model is captured by the notion of intermittent rotating t-star. An x-star is a set of x + 1 processes: a process p (the center of the star) plus a set of x processes (the points of the star). Intuitively, assuming logical times rn (round numbers), the intermittent rotating t-star assumption means that there are a process p, a subset of the round numbers rn, and associated sets Q(rn) such that each set {p}∪Q(rn) is a t-star centered at p, and each process of Q(rn) receives from p a message tagged rn in a timely manner or among the first (n - t) messages tagged rn it ever receives. The star is called t-rotating because the set Q(rn) is allowed to change with rn. It is called intermittent because the star can disappear during finite periods. This assumption, not only combines, but generalizes several synchrony and time-free assumptions that have been previously proposed to elect an eventual leader (e.g., eventual t-source, eventual t-moving source, message pattern assumption). Each of these assumptions appears as a particular case of the intermittent rotating t-star assumption. The second contribution of the paper is an algorithm that eventually elects a common leader in any system that satisfies the intermittent rotating t-star assumption. That algorithm enjoys, among others, two noteworthy properties. Firstly, from a design point of view, it is simple. Secondly, from a cost point of view, only the round numbers can increase without bound. This means that, be the execution finite or infinite, be links timely or not (or have the corresponding sender crashed or not), all the other local variables (including the timers) and message fields have a finite domain.

[1]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[2]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[3]  Marcos K. Aguilera,et al.  On implementing omega with weak reliability and synchrony assumptions , 2003, PODC '03.

[4]  Marcos K. Aguilera,et al.  Communication-efficient leader election and consensus with limited link synchrony , 2004, PODC '04.

[5]  Mikel Larrea,et al.  Optimal implementation of the weakest failure detector for solving consensus , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[6]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[7]  Achour Mostéfaoui,et al.  Leader-Based Consensus , 2001, Parallel Process. Lett..

[8]  Michel Raynal,et al.  Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony , 2006, DSN.

[9]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[10]  Achour Mostéfaoui,et al.  Crash-resilient time-free eventual leadership , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[11]  Achour Mostéfaoui,et al.  Time-free and timer-based assumptions can be combined to obtain eventual leadership , 2006, IEEE Transactions on Parallel and Distributed Systems.

[12]  Dahlia Malkhi,et al.  Brief Announcement: Chasing the Weakest System Model for Implementing Omega and Consensus , 2006, SSS.

[13]  Mikel Larrea,et al.  Optimal implementation of the weakest failure detector for solving consensus , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[14]  Rachid Guerraoui,et al.  Indulgent algorithms (preliminary version) , 2000, PODC '00.

[15]  Dahlia Malkhi,et al.  Chasing the Weakest System Model for Implementing Ω and Consensus , 2009, IEEE Transactions on Dependable and Secure Computing.

[16]  Dahlia Malkhi,et al.  Omega Meets Paxos: Leader Election and Stability Without Eventual Timely Links , 2005, DISC.

[17]  Achour Mostéfaoui,et al.  Asynchronous implementation of failure detectors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[18]  Michel Raynal,et al.  From an intermittent rotating star to a leader , 2007, PODC '07.

[19]  Achour Mostéfaoui,et al.  Interval Consistency of Asynchronous Distributed Computations , 2002, J. Comput. Syst. Sci..

[20]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1996, JACM.

[21]  Rachid Guerraoui,et al.  The information structure of indulgent consensus , 2004, IEEE Transactions on Computers.

[22]  Antonio Fernández,et al.  Implementing unreliable failure detectors with unknown membership , 2006, Inf. Process. Lett..