On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems

Unreliable failure detectors were proposed by Chandra and Toueg as mechanisms that provide information about process failures. Chandra and Toueg defined eight classes of failure detectors, depending on how accurate this information is, and presented an algorithm implementing a failure detector of one of these classes in a partially synchronous system. This algorithm is based on all-to-all communication and periodically exchanges a number of messages that is quadratic on the number of processes. We study the implementability of different classes of failure detectors in several models of partial synchrony. We first show that no failure detector with perpetual accuracy (namely, P, Q, S, and W) can be implemented in these models in systems with even a single failure. We also show that, in these models of partial synchrony, it is necessary a majority of correct processes to implement a failure detector of the class /spl theta/ proposed by Aguilera et al. Then, we present a family of distributed algorithms that implement the four classes of unreliable failure detectors with eventual accuracy (namely, /spl diams/P, /spl diams/Q, /spl diams/S, and /spl diams/W). Our algorithms are based on a logical ring arrangement of the processes, which defines the monitoring and failure information propagation pattern. The resulting algorithms periodically exchange at most a linear number of messages.

[1]  Achour Mostéfaoui,et al.  Asynchronous implementation of failure detectors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[2]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[3]  Achour Mostéfaoui,et al.  k-set agreement with limited accuracy failure detectors , 2000, PODC '00.

[4]  Indranil Gupta,et al.  On scalable and efficient distributed failure detectors , 2001, PODC '01.

[5]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[6]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[7]  Michel Raynal,et al.  Group membership failure detection: a simple protocol and its probabilistic analysis , 1999, Distributed Syst. Eng..

[8]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[9]  Rachid Guerraoui,et al.  A realistic look at failure detectors , 2002, Proceedings International Conference on Dependable Systems and Networks.

[10]  Achour Mostéfaoui,et al.  A necessary and sufficient condition for transforming limited accuracy failure detectors , 2004, J. Comput. Syst. Sci..

[11]  Rachid Guerraoui,et al.  "Gamma-Accurate" Failure Detectors , 1996, WDAG.

[12]  Michel Raynal,et al.  Restricted failure detectors: Definition and reduction protocols , 1999, Inf. Process. Lett..

[13]  Danny Dolev,et al.  On the minimal synchronism needed for distributed consensus , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[14]  Rachid Guerraoui,et al.  Non blocking atomic commitment with an unreliable failure detector , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[15]  Marcos K. Aguilera,et al.  Revising the Weakest Failure Detector for Uniform Reliable Broadcast , 1999, DISC.

[16]  Eli Gafni,et al.  Structured derivations of consensus algorithms for failure detectors , 1998, PODC '98.

[17]  Michel Raynal,et al.  An adaptive failure detection protocol , 2001, Proceedings 2001 Pacific Rim International Symposium on Dependable Computing.

[18]  Mikel Larrea,et al.  Optimal implementation of the weakest failure detector for solving consensus , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[19]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[20]  Marcos K. Aguilera,et al.  Stable Leader Election , 2001, DISC.

[21]  Mikel Larrea,et al.  Efficient Algorithms to Implement Unreliable Failure Detectors in Partially Synchronous Systems , 1999, DISC.

[22]  R. Guerraoui \??accurate" Failure Detectors , 1996 .

[23]  Mikel Larrea,et al.  Eventually consistent failure detectors , 2001, SPAA '01.

[24]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[25]  Achour Mostéfaoui,et al.  Unreliable Failure Detectors with Limited Scope Accuracy and an Application to Consensus , 1999, FSTTCS.

[26]  Pierre Sens,et al.  Implementation and performance evaluation of an adaptable failure detector , 2002, Proceedings International Conference on Dependable Systems and Networks.