Ring Paxos: High-Throughput Atomic Broadcast

Atomic broadcast is an important communication primitive often used to implement state-machine replication. Despite the large number of atomic broadcast algorithms proposed in the literature, few papers have discussed how to turn these algorithms into efficient executable protocols. This paper focuses on a class of atomic broadcast algorithms based on Paxos, with its corresponding desirable properties: safety under asynchrony assumptions, liveness under weak synchrony assumptions, and resiliency-optimality. The paper presents two protocols, M-Ring Paxos and U-Ring Paxos, derived from Paxos. The protocols inherit the properties of Paxos and can be implemented very efficiently. We report a detailed performance analysis of M-Ring Paxos and U-Ring Paxos and compare them to other atomic broadcast protocols.

[1]  Sape Mullender,et al.  Distributed systems , 1989 .

[2]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[3]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[4]  André Schiper,et al.  Tuning Paxos for High-Throughput with Batching and Pipelining , 2012, ICDCN.

[5]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[6]  Rachid Guerraoui,et al.  Throughput optimal total order broadcast for cluster environments , 2010, TOCS.

[7]  Leslie Lamport,et al.  The Implementation of Reliable Distributed Multiprocess Systems , 1978, Comput. Networks.

[8]  Péter Urbán,et al.  Token-based atomic broadcast using unreliable failure detectors , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[9]  Vijay M. Wadhai,et al.  Reliable Communication , 2014 .

[10]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[11]  Achour Mostéfaoui,et al.  Fault-tolerant Total Order Multicast to asynchronous groups , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[12]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[13]  Flaviu Cristian,et al.  The pinwheel asynchronous atomic broadcast protocols , 1995, Proceedings ISADS 95. Second International Symposium on Autonomous Decentralized Systems.

[14]  Robbert van Renesse,et al.  Paxos Made Moderately Complex , 2015, ACM Comput. Surv..

[15]  Andrew S. Tanenbaum,et al.  Group communication in the Amoeba distributed operating system , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[16]  Keith Marzullo,et al.  Mencius: Building Efficient Replicated State Machine for WANs , 2008, OSDI.

[17]  Louise E. Moser,et al.  The Totem single-ring ordering and membership protocol , 1995, TOCS.

[18]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[19]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[20]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[21]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[22]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[23]  Yoav Tock,et al.  Dr. multicast: Rx for data center communication scalability , 2010, EuroSys '10.

[24]  Tony P. Ng Ordered broadcasts for large applications , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.

[25]  Jongsung Kim,et al.  A total ordering protocol using a dynamic token-passing scheme , 1997, Distributed Syst. Eng..

[26]  Leslie Lamport,et al.  Cheap Paxos , 2004, International Conference on Dependable Systems and Networks, 2004.

[27]  Yair Amir,et al.  Paxos for System Builders: an overview , 2008, LADIS '08.

[28]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[29]  Fred B. Schneider What good are models and what models are good , 1993 .

[30]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[31]  Sam Toueg,et al.  Fault-tolerant broadcasts and related problems , 1993 .

[32]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[33]  André Schiper,et al.  S-Paxos: Offloading the Leader for High Throughput State Machine Replication , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[34]  Peng Li,et al.  Paxos Replicated State Machines as the Basis of a High-Performance Data Store , 2011, NSDI.

[35]  Tom Chen,et al.  Design and implementation , 2006, IEEE Commun. Mag..