Elastic Paxos: A Dynamic Atomic Multicast Protocol

Replication is a common technique used to design reliable distributed systems by masking defective components. To cope with the requirements of modern Internet applications, replication protocols must allow for throughput scalability and dynamic reconfiguration, that is, on-demand replacement or provisioning of system resources. This paper describes Elastic Paxos, a new dynamic atomic multicast protocol that fulfills these requirements. Elastic Paxos allows to dynamically add and remove resources to an online partially replicated state machine. We implemented Elastic Paxos and evaluated its performance in OpenStack, a cloud environment. We demonstrate its practicality to dynamically scale up and down a partially replicated data store with itand to reconfigure a distributed system.

[1]  Nancy A. Lynch,et al.  Revisiting the PAXOS algorithm , 1997, Theor. Comput. Sci..

[2]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[3]  Marcos K. Aguilera,et al.  Dynamic atomic storage without consensus , 2009, PODC '09.

[4]  Claudiu Danilov,et al.  The Spread Toolkit: Architecture and Performance , 2004 .

[5]  Fernando Pedone,et al.  Stretching multi-ring Paxos , 2015, SAC.

[6]  Jon Howell,et al.  The SMART way to migrate replicated stateful services , 2006, EuroSys.

[7]  Louise E. Moser,et al.  The Totem multiple-ring ordering and topology maintenance protocol , 1998, TOCS.

[8]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[9]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[10]  Leonard J. Bass,et al.  Rollup: Non-Disruptive Rolling Upgrade with Fast Consensus-Based Dynamic Reconfigurations , 2016, IEEE Transactions on Parallel and Distributed Systems.

[11]  Leslie Lamport,et al.  Reconfiguring a state machine , 2010, SIGA.

[12]  André Schiper,et al.  From set membership to group membership: a separation of concerns , 2006, IEEE Transactions on Dependable and Secure Computing.

[13]  Yang Wang,et al.  All about Eve: Execute-Verify Replication for Multi-Core Servers , 2012, OSDI.

[14]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[15]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[16]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[17]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[18]  Dahlia Malkhi,et al.  From paxos to CORFU: a flash-speed shared log , 2012, OPSR.

[19]  Sam Toueg,et al.  A Modular Approach to Fault-Tolerant Broadcasts and Related Problems , 1994 .

[20]  Fernando Pedone,et al.  Augustus: scalable and robust storage for cloud applications , 2013, EuroSys '13.

[21]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[22]  Fernando Pedone,et al.  On the Inherent Cost of Atomic Broadcast and Multicast in Wide Area Networks , 2008, ICDCN.

[23]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[24]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[25]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[26]  Fernando Pedone,et al.  Multi-Ring Paxos , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[27]  Leslie Lamport,et al.  Vertical paxos and primary-backup replication , 2009, PODC '09.

[28]  Rachid Guerraoui,et al.  Genuine atomic multicast in asynchronous distributed systems , 2001, Theor. Comput. Sci..

[29]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[30]  Fernando Pedone,et al.  Scalable State-Machine Replication , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[31]  G. Alonso,et al.  E-cast: Elastic Multicast , 2011 .

[32]  D. Andersen,et al.  A Proof of Correctness for Egalitarian Paxos , 2012 .

[33]  Achour Mostéfaoui,et al.  Fault-tolerant Total Order Multicast to asynchronous groups , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[34]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[35]  Brian F. Cooper Spanner: Google's globally-distributed database , 2013, SYSTOR '13.

[36]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[37]  André Schiper,et al.  Scalable atomic multicast , 1998, Proceedings 7th International Conference on Computer Communications and Networks (Cat. No.98EX226).

[38]  L. Lamport,et al.  Stoppable Paxos , 2008 .

[39]  André Schiper,et al.  Generic Broadcast , 1999, DISC.

[40]  Fernando Pedone,et al.  Ring Paxos: A high-throughput atomic broadcast protocol , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).