A hybrid approach for building eventually accurate failure detectors

Unreliable failure detectors introduced by Chandra and Toueg are abstract mechanisms that provide information about process crashes. On the one hand, failure detectors allow a statement of the minimal requirements on process failures that allow solutions to problems that cannot otherwise be solved in purely asynchronous systems. However, on the other hand, they cannot be implemented in such systems: their implementation requires that the underlying distributed system be enriched with additional assumptions. Classic failure detector implementations rely on additional synchrony assumptions such as partial synchrony. More recently, a new approach for implementing failure detectors has been proposed: it relies on behavioral properties on the flow of messages exchanged. This shows that these approaches are not antagonistic and can be advantageously combined. A hybrid protocol (the first to our knowledge) implementing failure detectors with eventual accuracy properties is presented. Interestingly, this protocol benefits from the best of both worlds in the sense that it converges (i.e., provides the required failure detector) as soon as either the system behaves synchronously or the required message exchange pattern is satisfied. This shows that, to expedite convergence, it can be interesting to consider that the underlying system can satisfy several alternative assumptions.

[1]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[2]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[3]  Achour Mostéfaoui,et al.  The best of both worlds: A hybrid approach to solve consensus , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[4]  Victor Kozyakin A Short Introduction to Asynchronous Systems , 2003 .

[5]  Michel Raynal,et al.  Group membership failure detection: a simple protocol and its probabilistic analysis , 1999, Distributed Syst. Eng..

[6]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[7]  Marcos K. Aguilera,et al.  Failure Detection and Randomization: A Hybrid Approach to Solve Consensus , 1998, SIAM J. Comput..

[8]  Achour Mostéfaoui,et al.  Asynchronous implementation of failure detectors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[9]  Mikel Larrea,et al.  Optimal implementation of the weakest failure detector for solving consensus , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[10]  Achour Mostéfaoui,et al.  An introduction to oracles for asynchronous distributed systems , 2002, Future Gener. Comput. Syst..

[11]  Marcos K. Aguilera,et al.  On Quiescent Reliable Communication , 2000, SIAM J. Comput..

[12]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[13]  Michel Raynal,et al.  An adaptive failure detection protocol , 2001, Proceedings 2001 Pacific Rim International Symposium on Dependable Computing.

[14]  Achour Mostéfaoui,et al.  Computing global functions in asynchronous distributed systems prone to process crashes , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[15]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1996, JACM.

[16]  Pierre Sens,et al.  Implementation and performance evaluation of an adaptable failure detector , 2002, Proceedings International Conference on Dependable Systems and Networks.

[17]  Marcos K. Aguilera,et al.  Stable Leader Election , 2001, DISC.

[18]  Mikel Larrea,et al.  Efficient Algorithms to Implement Unreliable Failure Detectors in Partially Synchronous Systems , 1999, DISC.

[19]  Indranil Gupta,et al.  On scalable and efficient distributed failure detectors , 2001, PODC '01.

[20]  Achour Mostéfaoui,et al.  Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors , 2000, IEEE Trans. Parallel Distributed Syst..

[21]  David Powell Failure mode assumptions and assumption coverage , 1992 .

[22]  Achour Mostéfaoui,et al.  A Versatile Family of Consensus Protocols Based on Chandra-Toueg's Unreliable Failure Detectors , 2002, IEEE Trans. Computers.