The weakest failure detector for eventual consistency

In its classical form, a consistent replicated service requires all replicas to witness the same evolution of the service state. If we consider an asynchronous message-passing environment in which processes might fail by crashing, and assume that a majority of processes are correct, then the necessary and sufficient information about failures for implementing a general state machine replication scheme ensuring consistency is captured by the $$\varOmega $$Ω failure detector. This paper shows that in such a message-passing environment, $$\varOmega $$Ω is also the weakest failure detector to implement an eventually consistent replicated service, where replicas are expected to agree on the evolution of the service state only after some (a priori unknown) time. In fact, we show that $$\varOmega $$Ω is the weakest to implement eventual consistency in any message-passing environment, i.e., under any assumption on when and where failures might occur. Ensuring (strong) consistency in any environment requires, in addition to $$\varOmega $$Ω, the quorum failure detector $$\varSigma $$Σ. Our paper thus captures, for the first time, an exact computational difference between building a replicated state machine that ensures consistency and one that only ensures eventual consistency.

[1]  Felix C. Freiling,et al.  The failure detector abstraction , 2011, CSUR.

[2]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[3]  Dan Dobre,et al.  Eventually linearizable shared objects , 2010, PODC '10.

[4]  Thomas C. Bressoud,et al.  Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles , 2007, SOSP 2007.

[5]  Rachid Guerraoui,et al.  A paradox of eventual linearizability in shared memory , 2014, PODC '14.

[6]  Sebastian Burckhardt,et al.  Global Sequence Protocol: A Robust Abstraction for Replicated Shared State , 2015, ECOOP.

[7]  Rachid Guerraoui,et al.  The Weakest Failure Detectors to Solve Quittable Consensus and Nonblocking Atomic Commit , 2012, SIAM J. Comput..

[8]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[9]  Sam Toueg,et al.  A Modular Approach to Fault-Tolerant Broadcasts and Related Problems , 1994 .

[10]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[11]  Shlomi Dolev,et al.  When Consensus Meets Self-stabilization , 2006, OPODIS.

[12]  Leslie Lamport,et al.  Proving the Correctness of Multiprocess Programs , 1977, IEEE Transactions on Software Engineering.

[13]  B SchneiderFred Implementing fault-tolerant services using the state machine approach: a tutorial , 1990 .

[14]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[15]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[16]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[17]  Marvin Theimer,et al.  Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[18]  Rachid Guerraoui,et al.  Tight failure detection bounds on atomic object implementations , 2010, JACM.

[19]  Sam Toueg,et al.  Every problem has a weakest failure detector , 2008, PODC '08.

[20]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[21]  Leslie Lamport,et al.  Lower bounds for asynchronous consensus , 2006, Distributed Computing.

[22]  Yoram Moses,et al.  Coordinated consensus in dynamic networks , 2011, PODC '11.

[23]  Sebastian Burckhardt,et al.  Principles of Eventual Consistency , 2014, Found. Trends Program. Lang..

[24]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[25]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[26]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[27]  Yasushi Saito,et al.  Optimistic replication , 2005, CSUR.

[28]  Petr Kuznetsov,et al.  Zeno: Eventually Consistent Byzantine-Fault Tolerance , 2009, NSDI.

[29]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[30]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[31]  Flaviu Cristian,et al.  Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement , 1995, Inf. Comput..

[32]  Nancy A. Lynch,et al.  Eventually-serializable data services , 1996, PODC '96.

[33]  Achour Mostéfaoui,et al.  From Binary Consensus to Multivalued Consensus in asynchronous message-passing systems , 2000, Inf. Process. Lett..