Consistency or latency? A quantitative analysis of replication systems based on replicated state machines

Existing theories like CAP and PACELC have claimed that there are tradeoffs between some pairs of performance measures in distributed replication systems, such as consistency and latency. However, current systems take a very vague view on how to balance those tradeoffs, e.g. eventual consistency. In this work, we are concerned with providing a quantitative analysis on consistency and latency for widely-used replicated state machines(RSMs). Based on our presented generic RSM model called RSM-d, probabilistic models are built to quantify consistency and latency. We show that both are affected by d, which is the number of ACKs received by the coordinator before committing a write request. And we further define a payoff model through combining the consistency and latency models. Finally, with Monte Carlo based simulation, we validate our presented models and show the effectiveness of our solutions in terms of how to obtain an optimal tradeoff between consistency and latency.

[1]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[2]  Idit Keidar,et al.  On the Cost of Fault-Tolerant Consensus When There Are No Faults - A Tutorial , 2003, LADC.

[3]  Ion Stoica,et al.  Probabilistically Bounded Staleness for Practical Partial Quorums , 2012, Proc. VLDB Endow..

[4]  Butler W. Lampson,et al.  The ABCD's of Paxos , 2001, PODC '01.

[5]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[6]  Jeffrey Dean,et al.  Designs, Lessons and Advice from Building Large Distributed Systems , 2009 .

[7]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[8]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors based on control theory , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[9]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[10]  Michel Raynal,et al.  Group membership failure detection: a simple protocol and its probabilistic analysis , 1999, Distributed Syst. Eng..

[11]  André Schiper,et al.  Comparative Performance Analysis of Ordering Strategies in Atomic Broadcast Algorithms , 2003 .

[12]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[13]  Daniel J. Abadi,et al.  Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story , 2012, Computer.

[14]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[15]  Robbert van Renesse,et al.  A Gossip-Style Failure Detection Service , 2009 .

[16]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[17]  Prashant J. Shenoy,et al.  Resilient and coherence preserving dissemination of dynamic data using cooperating peers , 2004, IEEE Transactions on Knowledge and Data Engineering.

[18]  Jennifer Widom,et al.  Flexible Constraint Management for Autonomous Distributed Databases , 1994, IEEE Data Eng. Bull..

[19]  Ashish Gupta,et al.  Distributed constraint management for collaborative engineering databases , 1993, CIKM '93.

[20]  Fernando Pedone,et al.  Ring Paxos: A high-throughput atomic broadcast protocol , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[21]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[22]  Amin Vahdat,et al.  Design and evaluation of a continuous consistency model for replicated services , 2000, OSDI.

[23]  Gustavo Alonso,et al.  Database replication , 2010, Proc. VLDB Endow..

[24]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[25]  Kenneth P. Birman,et al.  Reliable Distributed Systems: Technologies, Web Services, and Applications , 2005 .

[26]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[27]  Flavio Paiva Junqueira,et al.  Leader Election for Replicated Services Using Application Scores , 2011, Middleware.

[28]  Pierre Sens,et al.  Implementation and performance evaluation of an adaptable failure detector , 2002, Proceedings International Conference on Dependable Systems and Networks.

[29]  Michael K. Reiter,et al.  Probabilistic quorum systems , 1997, PODC '97.

[30]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[31]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[32]  Michael K. Reiter,et al.  Selected Results from the Latest Decade of Quorum Systems Research , 2010, Replication.

[33]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[34]  Fernando Pedone,et al.  Multi-Ring Paxos , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[35]  Doug Terry,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[36]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[37]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.