Optimistic Causal Consistency for Geo-Replicated Key-Value Stores

In this paper we present a new approach to implementing causal consistency in geo-replicated data stores, which we call Optimistic Causal Consistency (OCC). The optimism in our approach lies in that updates from a remote data center are immediately made visible in the local data center, without checking if their causal dependencies have been received. Servers perform the dependency check needed to enforce causal consistency only upon serving a client operation, rather than on the receipt of a replicated data item as in existing systems. OCC explores a novel trade-off in the landscape of causal consistency protocols. The potentially blocking behavior of OCC makes it vulnerable to network partitions. Because network partitions are rare in practice, however, OCC chooses to trade availability to maximize data freshness and reduce the communication overhead. We further propose a recovery mechanism that allows an OCC system to fall back on a pessimistic protocol to continue operating even during network partitions. POCC is an implementation of OCC based on physical clocks. We show that OCC improves data freshness, while offering comparable or better performance than its pessimistic counterpart.

[1]  Brian Beckman,et al.  Time warp operating system , 1987, SOSP '87.

[2]  João Leitão,et al.  ChainReaction: a causal+ consistent datastore based on chain replication , 2013, EuroSys '13.

[3]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[4]  Sameh Elnikety,et al.  Orbe: scalable causal consistency using dependency matrices and physical clocks , 2013, SoCC.

[5]  Kevin Lee,et al.  Data Consistency Properties and the Trade-offs in Commercial Cloud Storage: the Consumers' Perspective , 2011, CIDR.

[6]  Indranil Gupta,et al.  Ambry: LinkedIn's Scalable Geo-Distributed Object Store , 2016, SIGMOD Conference.

[7]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[8]  Ion Stoica,et al.  Probabilistically Bounded Staleness for Practical Partial Quorums , 2012, Proc. VLDB Endow..

[9]  Ali Ghodsi,et al.  Highly Available Transactions: Virtues and Limitations , 2013, Proc. VLDB Endow..

[10]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[11]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[12]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[13]  Peter Bailis,et al.  The network is reliable , 2014, Commun. ACM.

[14]  Jason Flinn,et al.  Speculative execution in a distributed file system , 2005, SOSP '05.

[15]  Michael J. Freedman,et al.  Stronger Semantics for Low-Latency Geo-Replicated Storage , 2013, NSDI.

[16]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[17]  Javier García,et al.  TPC-W E-Commerce Benchmark Evaluation , 2003, Computer.

[18]  E. Brewer,et al.  CAP twelve years later: How the "rules" have changed , 2012, Computer.

[19]  Sérgio Duarte,et al.  Write Fast, Read in the Past: Causal Consistency for Client-Side Applications , 2015, Middleware.

[20]  Ali Ghodsi,et al.  Bolt-on causal consistency , 2013, SIGMOD '13.

[21]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[22]  Sameh Elnikety,et al.  Predicting replicated database scalability from standalone database profiling , 2009, EuroSys '09.

[23]  Willy Zwaenepoel,et al.  GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks , 2014, SoCC.

[24]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[25]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[26]  Xiaozhou Li,et al.  What Consistency Does Your Key-Value Store Actually Provide? , 2010, HotDep.

[27]  Diego R. Llanos Ferraris,et al.  TPCC-UVa: an open-source TPC-C implementation for parallel and distributed systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[28]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[29]  Sanjeev Kumar,et al.  Existential consistency: measuring and understanding consistency at Facebook , 2015, SOSP.

[30]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[31]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[32]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[33]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[34]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[35]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[36]  Rachid Guerraoui,et al.  Incremental Consistency Guarantees for Replicated Objects , 2016, OSDI.

[37]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[38]  Annette Bieniusa,et al.  Cure: Strong Semantics Meets High Availability and Low Latency , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[39]  Alley Stoughton,et al.  Detection of Mutual Inconsistency in Distributed Systems , 1983, IEEE Transactions on Software Engineering.