A Versatile Family of Consensus Protocols Based on Chandra-Toueg's Unreliable Failure Detectors

This paper is on consensus protocols for asynchronous distributed systems prone to process crashes, but equipped with Chandra-Toueg's (1996) unreliable failure detectors. It presents a unifying approach based on two orthogonal versatility dimensions. The first concerns the class of the underlying failure detector. An instantiation can consider any failure detector of the class S (provided that at least one process does not crash), or oS (provided that a majority of processes do not crash). The second versatility dimension concerns the message exchange pattern used during each round of the protocol. This pattern (and, consequently, the round message cost) can be defined for each round separately, varying from O(n) (centralized pattern) to O(n/sup 2/) (fully distributed pattern), n being the number of processes. The resulting versatile protocol has nice features and actually gives rise to a large and well-identified family of failure detector-based consensus protocols. Interestingly, this family includes at once new protocols and some well-known protocols (e.g., Chandra-Toueg's oS-based protocol). The approach is also interesting from a methodological point of view. It provides a precise characterization of the two sets of processes that, during a round, have to receive messages for a decision to be taken (liveness) and for a single value to be decided (safety), respectively. Interestingly, the versatility of the protocol is not restricted to failure detectors: a simple timer-based instance provides a consensus protocol suited to partially synchronous systems.

[1]  Achour Mostéfaoui,et al.  A hierarchy of conditions for consensus solvability , 2001, PODC '01.

[2]  Mukesh Singhal,et al.  Deadlock Models and a General Algorithm for Distributed Deadlock Detection , 1996, J. Parallel Distributed Comput..

[3]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[4]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[5]  Achour Mostéfaoui,et al.  Conditions on input vectors for consensus solvability in asynchronous distributed systems , 2001, STOC '01.

[6]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[7]  Nancy A. Lynch,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[8]  Achour Mostéfaoui,et al.  Consensus based on failure detectors with a perpetual accuracy property , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[9]  Marcos K. Aguilera,et al.  Failure Detection and Randomization: A Hybrid Approach to Solve Consensus , 1998, SIAM J. Comput..

[10]  Achour Mostéfaoui,et al.  A General Scheme for Token- and Tree-Based Distributed Mutual Exclusion Algorithms , 1994, IEEE Trans. Parallel Distributed Syst..

[11]  Achour Mostéfaoui,et al.  The best of both worlds: A hybrid approach to solve consensus , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[12]  Achour Mostéfaoui,et al.  Solving Consensus Using Chandra-Toueg's Unreliable Failure Detectors: A General Quorum-Based Approach , 1999, DISC.

[13]  Marcos K. Aguilera,et al.  Failure detection and consensus in the crash-recovery model , 2000, Distributed Computing.

[14]  Ajay D. Kshemkalyani,et al.  On Characterization and Correctness of Distributed Deadlock Detection , 1994, J. Parallel Distributed Comput..

[15]  Eli Gafni,et al.  Structured derivations of consensus algorithms for failure detectors , 1998, PODC '98.

[16]  Dale Skeen,et al.  Nonblocking commit protocols , 1981, SIGMOD '81.

[17]  Raimundo José de Araújo Macêdo,et al.  Time and message-efficient S-based consensus (brief announcement) , 2000, PODC '00.

[18]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1996, JACM.

[19]  Paulo Veríssimo,et al.  Topology-Aware Algorithms for Large-Scale Communication , 1999, Advances in Distributed Systems.

[20]  André Schiper Early consensus in an asynchronous system with a weak failure detector , 1997, Distributed Computing.

[21]  Michel Raynal,et al.  A simple and fast asynchronous consensus protocol based on a weak failure detector , 1999, Distributed Computing.

[22]  Achour Mostéfaoui,et al.  Consensus in asynchronous systems where processes can crash and recover , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).