The Generic Consensus Service

This paper describes a modular approach for the construction of fault-tolerant agreement protocols. The approach is based on a generic consensus service. Fault-tolerant agreement protocols are built using a client-server interaction, where the clients are the processes that must solve the agreement problem and the servers implement the consensus service. This service is accessed through a generic consensus filter, customized for each specific agreement problem. We illustrate our approach on the construction of various fault-tolerant agreement protocols, such as nonblocking atomic commitment, group membership, view synchronous communication, and total order multicast. Through a systematic reduction to consensus, we provide a simple way to solve agreement problems. In addition to its modularity, our approach enables efficient implementations of agreement protocols and precise characterization of the assumptions underlying their liveness and safety properties.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  Dale Skeen,et al.  Nonblocking commit protocols , 1981, SIGMOD '81.

[3]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[4]  Leslie Lamport,et al.  Paradigms for Distributed Programs , 1984, Advanced Course: Distributed Systems.

[5]  Leslie Lamport,et al.  Distributed Systems: Methods and Tools for Specification, An Advanced Course, April 3-12, 1984 and April 16-25, 1985, Munich, Germany , 1985, Advanced Course: Distributed Systems.

[6]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[7]  Walter A. Burkhard,et al.  Consistency and recovery control for replicated files , 1985, SOSP '85.

[8]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[9]  Kenneth P. Birman,et al.  Using process groups to implement failure detection in asynchronous environments , 1991, PODC '91.

[10]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[11]  Dennis Shasha,et al.  The many faces of consensus in distributed systems , 1992, Computer.

[12]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[13]  Danny Dolev,et al.  Early delivery totally ordered multicast in asynchronous environments , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[14]  André Schiper,et al.  Uniform reliable multicast in a virtually synchronous environment , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[15]  Rachid Guerraoui,et al.  Transaction Model vs. Virtual Synchrony Model: Bridging the Gap , 1994, Dagstuhl Seminar on Distributed Systems.

[16]  A. Fleischmann Distributed Systems , 1994, Springer Berlin Heidelberg.

[17]  Idit Keidar,et al.  Increasing the resilience of atomic commit, at no additional cost , 1995, PODS '95.

[18]  Louise E. Moser,et al.  The Totem single-ring ordering and membership protocol , 1995, TOCS.

[19]  Rachid Guerraoui Revistiting the Relationship Between Non-Blocking Atomic Commitment and Consensus , 1995, WDAG.

[20]  Andre Schiper,et al.  View Synchronous Communication in Large Scale Networks , 1995 .

[21]  Newtop: a fault-tolerant group communication protocol , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[22]  Keith Marzullo,et al.  Election Vs. Consensus in Asynchronous Systems , 1995 .

[23]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1996, JACM.

[24]  Bernadette Charron-Bost,et al.  Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes , 1996, WDAG.

[25]  Christoph Peter Malloth,et al.  Conception and implementation of a toolkit for building fault-tolerant distributed applications in large scale networks , 1996 .

[26]  Rachid Guerraoui,et al.  Reducing the cost for non-blocking in atomic commitment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[27]  Bernadette Charron-Bost,et al.  On the impossibility of group membership , 1996, PODC '96.

[28]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[29]  Rachid Guerraoui,et al.  Genuine Atomic Multicast , 1997, WDAG.

[30]  Rachid Guerraoui,et al.  Total order multicast to multiple groups , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[31]  André Schiper,et al.  Consensus in the Crash-Recover Model , 1997 .

[32]  Alberto Montresor,et al.  Group membership and view synchrony in partitionable asynchronous distributed systems: specifications , 1997, OPSR.

[33]  Marcos K. Aguilera,et al.  Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication , 1997, WDAG.

[34]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[35]  André Schiper Early consensus in an asynchronous system with a weak failure detector , 1997, Distributed Computing.

[36]  Marcos K. Aguilera,et al.  Failure detection and consensus in the crash-recovery model , 1998, Distributed Computing.

[37]  Achour Mostéfaoui,et al.  Consensus in asynchronous systems where processes can crash and recover , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[38]  André Schiper,et al.  Scalable atomic multicast , 1998, Proceedings 7th International Conference on Computer Communications and Networks (Cat. No.98EX226).

[39]  Marcos K. Aguilera,et al.  Failure detection and consensus in the crash-recovery model , 2000, Distributed Computing.