Distributed transactional reads: the strong, the quick, the fresh & the impossible

This paper studies the costs and trade-offs of providing transactional consistent reads in a distributed storage system. We identify the following dimensions: read consistency, read delay (latency), and data freshness. We show that there is a three-way trade-off between them, which can be summarised as follows: (i) it is not possible to ensure at the same time order-preserving (e.g., causally-consistent) or atomic reads, Minimal Delay, and maximal freshness; thus, reading data that is the most fresh without delay is possible only in a weakly-isolated mode; (ii) to ensure atomic or order-preserving reads at Minimal Delay imposes to read data from the past (not fresh); (iii) however, order-preserving minimal-delay reads can be fresher than atomic; (iv) reading atomic or order-preserving data at maximal freshness may block reads or writes indefinitely. Our impossibility results hold independently of other features of the database, such as update semantics (totally ordered or not) or data model (structured or unstructured). Guided by these results, we modify an existing protocol to ensure minimal-delay reads (at the cost of freshness) under atomic-visibility and causally-consistent semantics. Our experimental evaluation supports the theoretical results.

[1]  Alejandro Z. Tomsic,et al.  Exploring the design space of highly-available distributed transactions , 2018 .

[2]  Marc Shapiro,et al.  Non-monotonic Snapshot Isolation: Scalable and Strong Consistency for Geo-replicated Transactional Systems , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[3]  Rachid Guerraoui,et al.  Causal Consistency and Latency Optimality: Friend or Foe? , 2018, Proc. VLDB Endow..

[4]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[5]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[6]  Marvin Theimer,et al.  Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[7]  Fernando Pedone,et al.  Scalable deferred update replication , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[8]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[9]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[10]  Luís E. T. Rodrigues,et al.  When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[11]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[12]  Shuai Mu,et al.  The SNOW Theorem and Latency-Optimal Read-Only Transactions , 2016, OSDI.

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[15]  Lalit M. Patnaik,et al.  Update Serializability in Locking , 1986, ICDT.

[16]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[17]  Butler W. Lampson,et al.  A New Presumed Commit Optimization for Two Phase Commit , 1993, VLDB.

[18]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[19]  Annette Bieniusa,et al.  Cure: Strong Semantics Meets High Availability and Low Latency , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[20]  Willy Zwaenepoel,et al.  GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks , 2014, SoCC.

[21]  David Kenneth Gifford,et al.  Information storage in a decentralized computer system , 1981 .

[22]  Robert Thomas,et al.  Maintenance of duplicate databases , 1975, RFC.

[23]  Michael J. Freedman,et al.  Stronger Semantics for Low-Latency Geo-Replicated Storage , 2013, NSDI.

[24]  Ali Ghodsi,et al.  Highly Available Transactions: Virtues and Limitations , 2013, Proc. VLDB Endow..

[25]  Butler W. Lampson,et al.  Crash Recovery in a Distributed Data Storage System , 1981 .

[26]  Sameh Elnikety,et al.  Orbe: scalable causal consistency using dependency matrices and physical clocks , 2013, SoCC.

[27]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[28]  David Zhang,et al.  On brewing fresh espresso: LinkedIn's distributed data serving platform , 2013, SIGMOD '13.

[29]  Hagit Attiya,et al.  Sequential consistency versus linearizability (extended abstract) , 1991, SPAA '91.

[30]  João Leitão,et al.  ChainReaction: a causal+ consistent datastore based on chain replication , 2013, EuroSys '13.

[31]  Ali Ghodsi,et al.  Scalable atomic visibility with RAMP transactions , 2014, SIGMOD Conference.

[32]  Sanjeev Kumar,et al.  Challenges to Adopting Stronger Consistency at Scale , 2015, HotOS.

[33]  Marc Shapiro,et al.  Consistency in 3D , 2016, CONCUR.