Practical Experience Report: The Performance of Paxos in the Cloud

This experience report presents the results of an extensive performance evaluation conducted using four open-source implementations of Paxos deployed in Amazon's EC2. Paxos is a fundamental algorithm for building fault-tolerant services, at the core of state-machine replication. Implementations of Paxos are currently used in many prototypes and production systems in both academia and industry. Although all protocols surveyed in the paper implement Paxos, they are optimized in a number of different ways, resulting in very different behavior, as we show in the paper. We have considered a variety of configurations and failure-free and faulty executions. In addition to reporting our findings, we propose and assess additional optimizations to existing implementations.

[1]  Yair Amir,et al.  Paxos for System Builders: an overview , 2008, LADIS '08.

[2]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[3]  Michael Isard,et al.  Autopilot: automatic data center management , 2007, OPSR.

[4]  Brian F. Cooper Spanner: Google's globally-distributed database , 2013, SYSTOR '13.

[5]  Fernando Pedone,et al.  Multi-Ring Paxos , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[6]  Leslie Lamport,et al.  Cheap Paxos , 2004, International Conference on Dependable Systems and Networks, 2004.

[7]  Flavio Paiva Junqueira,et al.  Zab: High-performance broadcast for primary-backup systems , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[8]  Robbert van Renesse,et al.  Paxos Made Moderately Complex , 2015, ACM Comput. Surv..

[9]  HariGovind V. Ramasamy,et al.  Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast , 2005, OPODIS.

[10]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[11]  Kenneth P. Birman,et al.  The ISIS project: real experience with a fault tolerant programming system , 1990, EW 4.

[12]  Fernando Pedone,et al.  Ring Paxos: A high-throughput atomic broadcast protocol , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[13]  André Schiper,et al.  S-Paxos: Offloading the Leader for High Throughput State Machine Replication , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[14]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[15]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[16]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[17]  Emin Gün Sirer,et al.  Commodifying Replicated State Machines with OpenReplica , 2012 .

[18]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[19]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.