A necessary and sufficient condition for transforming limited accuracy failure detectors

Unreliable failure detectors are oracles that give information about process failures. Chandra and Toueg were first to study such failure detectors for distributed systems, and they identified a number that enabled the solution of the Consensus problem in asynchronous distributed systems. This paper focuses on two of these, denoted J (strong) and ♦J (eventually strong). The characteristics of a given unreliable failure detector are usually described by its completeness and accuracy properties. Completeness is a requirement on the actual detection of failures, while accuracy limits the mistakes a failure detector can make. Let the scope of the accuracy property of an unreliable failure detector be the minimum number (k) of processes that may not erroneously suspect a correct process to have crashed. Usual failure detectors implicitly consider a scope equal to n (the total number of processes). Accuracy properties with limited scope give rise to the classes of failure detectors that we call Jk and ♦Jk. This paper investigates the following question: "Given Jk and ♦Jk, under which condition is it possible to transform their failure detectors into their counterparts with unlimited accuracy, i.e., AP and J ♦J?". The paper answers this question in the following way. It first presents a particularly simple protocol that realizes such a transformation when f > k (where f is the maximum number of processes that may crash). Then, it shows that there is no reduction protocol when f ≥ k.

[1]  O. ShlomoM EXTENDED IMPOSSIBILITY RESULTS FOR ASYNCHRONOUS COMPLETE NETWORKS , 2002 .

[2]  ChaudhuriSoma More choices allow more faults , 1993 .

[3]  Rachid Guerraoui,et al.  "Gamma-Accurate" Failure Detectors , 1996, WDAG.

[4]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[5]  André Schiper Early consensus in an asynchronous system with a weak failure detector , 1997, Distributed Computing.

[6]  Eli Gafni,et al.  Structured derivations of consensus algorithms for failure detectors , 1998, PODC '98.

[7]  Soma Chaudhuri,et al.  More Choices Allow More Faults: Set Consensus Problems in Totally Asynchronous Systems , 1993, Inf. Comput..

[8]  Achour Mostéfaoui,et al.  A Versatile Family of Consensus Protocols Based on Chandra-Toueg's Unreliable Failure Detectors , 2002, IEEE Trans. Computers.

[9]  Achour Mostéfaoui,et al.  k-set agreement with limited accuracy failure detectors , 2000, PODC '00.

[10]  Achour Mostéfaoui,et al.  Solving Consensus Using Chandra-Toueg's Unreliable Failure Detectors: A General Quorum-Based Approach , 1999, DISC.

[11]  R. Guerraoui \??accurate" Failure Detectors , 1996 .

[12]  Michel Raynal,et al.  Restricted failure detectors: Definition and reduction protocols , 1999, Inf. Process. Lett..

[13]  Danny Dolev,et al.  On the minimal synchronism needed for distributed consensus , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[14]  Achour Mostéfaoui,et al.  Unreliable Failure Detectors with Limited Scope Accuracy and an Application to Consensus , 1999, FSTTCS.

[15]  Michel Raynal,et al.  A simple and fast asynchronous consensus protocol based on a weak failure detector , 1999, Distributed Computing.

[16]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[17]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.