A robust and lightweight stable leader election service for dynamic systems

We describe the implementation and experimental evaluation of a fault-tolerant leader election service for dynamic systems. Intuitively, distributed applications can use this service to elect and maintain an operational leader for any group of processes which may dynamically change. If the leader of a group crashes, is temporarily disconnected, or voluntarily leaves the group, the service automatically re-elects a new group leader. The current version of the service implements two recent leader election algorithms, and users can select the one that fits their system better. Both algorithms ensure leader stability, a desirable feature that lacks in some other algorithms, but one is more robust in the face of extreme network disruptions, while the other is more scalable. The leader election service is flexible and easy to use. By using a stochastic failure detector and a link quality estimator, it provides some degree of QoS control and it adapts to changing network conditions. Our experimental evaluation indicates that it is also highly robust and inexpensive to run in practice.

[1]  Flaviu Cristian,et al.  A Highly Available Local Leader Election Service , 1999, IEEE Trans. Software Eng..

[2]  Marcos K. Aguilera,et al.  On implementing omega with weak reliability and synchrony assumptions , 2003, PODC '03.

[3]  Marcos K. Aguilera,et al.  Communication-efficient leader election and consensus with limited link synchrony , 2004, PODC '04.

[4]  Nancy A. Lynch,et al.  Revisiting the PAXOS algorithm , 1997, Theor. Comput. Sci..

[5]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[6]  Marcos K. Aguilera,et al.  Stable Leader Election , 2001, DISC.

[7]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[8]  Imdea Networks Eventual Leader Election with Weak Assumptions on Initial Knowledge,Communication Reliability,and Synchrony , 2010 .

[9]  Mikel Larrea,et al.  Optimal implementation of the weakest failure detector for solving consensus , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[10]  Dahlia Malkhi,et al.  Omega Meets Paxos: Leader Election and Stability Without Eventual Timely Links , 2005, DISC.

[11]  Rachid Guerraoui,et al.  Fast Indulgent Consensus with Zero Degradation , 2002, EDCC.

[12]  Indranil Gupta,et al.  A Probabilistically Correct Leader Election Protocol for Large Groups , 2000, DISC.

[13]  Achour Mostéfaoui,et al.  Leader-Based Consensus , 2001, Parallel Process. Lett..

[14]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.