State Machine Replication Is More Expensive Than Consensus

Consensus and State Machine Replication (SMR) are generally considered to be equivalent problems. In certain system models, indeed, the two problems are computationally equivalent: any solution to the former problem leads to a solution to the latter, and vice versa. In this paper, we study the relation between consensus and SMR from a complexity perspective. We find that, surprisingly, completing an SMR command can be more expensive than solving a consensus instance. Specifically, given a synchronous system model where every instance of consensus always terminates in constant time, completing an SMR command does not necessarily terminate in constant time. This result naturally extends to partially synchronous models. Besides theoretical interest, our result also corresponds to practical phenomena we identify empirically. We experiment with two well-known SMR implementations (Multi-Paxos and Raft) and show that, indeed, SMR is more expensive than consensus in practice. One important implication of our result is that—even under synchrony conditions—no SMR algorithm can ensure bounded response times. 2012 ACM Subject Classification Computing methodologies → Distributed algorithms

[1]  Leslie Lamport,et al.  Lower bounds for asynchronous consensus , 2006, Distributed Computing.

[2]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[3]  Yoram Moses,et al.  A Layered Analysis of Consensus , 2002, SIAM J. Comput..

[4]  Fernando Pedone,et al.  Chasing the Tail of Atomic Broadcast Protocols , 2015, 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS).

[5]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[6]  Idit Keidar,et al.  Impossibility Results and Lower Bounds for Consensus under Link Failures , 2008, SIAM J. Comput..

[7]  Fernando Pedone,et al.  Multicoordinated Agreement Protocols for Higher Availabilty , 2008, 2008 Seventh IEEE International Symposium on Network Computing and Applications.

[8]  Roberto Palmieri,et al.  Speeding up Consensus by Chasing Fast Decisions , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[9]  Luiz Eduardo Buzato,et al.  Seamless Paxos coordinators , 2013, Cluster Computing.

[10]  André Schiper,et al.  The Heard-Of model: computing in distributed systems with benign faults , 2009, Distributed Computing.

[11]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[12]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[13]  David G. Andersen,et al.  There is more consensus in Egalitarian parliaments , 2013, SOSP.

[14]  Nicola Santoro,et al.  Time is Not a Healer , 1989, STACS.

[15]  Rachid Guerraoui,et al.  Introduction to Reliable and Secure Distributed Programming , 2011 .

[16]  Peter Robinson,et al.  Gracefully Degrading Consensus and k-Set Agreement in Directed Dynamic Networks , 2015, NETYS.

[17]  Jon Crowcroft,et al.  Coracle: Evaluating Consensus at the Internet Edge , 2015, Comput. Commun. Rev..

[18]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[19]  Ulrich Schmid,et al.  Formally verified Byzantine agreement in presence of link faults , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[20]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[21]  Nissim Francez,et al.  Decomposition of Distributed Programs into Communication-Closed Layers , 1982, Sci. Comput. Program..

[22]  Tim Kraska,et al.  MDCC: multi-data center consistency , 2012, EuroSys '13.

[23]  Alexey Gotsman,et al.  Paxos Consensus, Deconstructed and Abstracted (Extended Version) , 2018, ArXiv.

[24]  H ThomasRobert A Majority consensus approach to concurrency control for multiple copy databases , 1979 .

[25]  Rachid Guerraoui,et al.  On the complexity of asynchronous gossip , 2008, PODC '08.

[26]  Jason Flinn,et al.  Tolerating Latency in Replicated State Machines Through Client Speculation , 2009, NSDI.

[27]  Dahlia Malkhi,et al.  Flexible Paxos: Quorum Intersection Revisited , 2016, OPODIS.

[28]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[29]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[30]  Nicola Santoro,et al.  Agreement in synchronous networks with ubiquitous faults , 2007, Theor. Comput. Sci..

[31]  Alysson Neves Bessani,et al.  State Machine Replication for the Masses with BFT-SMART , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[32]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[33]  Achour Mostéfaoui,et al.  Low cost consensus-based Atomic Broadcast , 2000, Proceedings. 2000 Pacific Rim International Symposium on Dependable Computing.

[34]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[35]  Leslie Lamport,et al.  Cheap Paxos , 2004, International Conference on Dependable Systems and Networks, 2004.

[36]  Andreas Haeberlen,et al.  Proactive Replication for Data Durability , 2006, IPTPS.

[37]  Idit Keidar,et al.  On the cost of fault-tolerant consensus when there are no faults: preliminary version , 2001, SIGA.

[38]  Michael Williams,et al.  Replication in the harp file system , 1991, SOSP '91.

[39]  GhemawatSanjay,et al.  The Google file system , 2003 .

[40]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[41]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[42]  André Schiper,et al.  Tuning Paxos for High-Throughput with Batching and Pipelining , 2012, ICDCN.

[43]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[44]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[45]  Rachid Guerraoui,et al.  Incremental Consistency Guarantees for Replicated Objects , 2016, OSDI.

[46]  Eli Gafni,et al.  Round-by-round fault detectors (extended abstract): unifying synchrony and asynchrony , 1998, PODC '98.