TARDiS: A Branch-and-Merge Approach To Weak Consistency

This paper presents the design, implementation, and evaluation of TARDiS (Transactional Asynchronously Replicated Divergent Store), a transactional key-value store explicitly designed for weakly-consistent systems. Reasoning about these systems is hard, as neither causal consistency nor per-object eventual convergence allow applications to deal satisfactorily with write-write conflicts. TARDiS instead exposes as its fundamental abstraction the set of conflicting branches that arise in weakly-consistent systems. To this end, TARDiS introduces a new concurrency control mechanism: branch-on-conflict. On the one hand, TARDiS guarantees that storage will appear sequential to any thread of execution that extends a branch, keeping application logic simple. On the other, TARDiS provides applications, when needed, with the tools and context necessary to merge branches atomically, when and how applications want. Since branch-on-conflict in TARDiS is fast, weakly-consistent applications can benefit from adopting this paradigm not only for operations issued by different sites, but also, when appropriate, for conflicting local operations. We find that TARDiS reduces coding complexity for these applications and that judicious branch-on-conflict can improve their local throughput at each site by two to eight times.

[1]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[2]  Joan Manuel Marquès,et al.  A Commutative Replicated Data Type for Cooperative Editing , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[3]  Marc Shapiro,et al.  Non-monotonic Snapshot Isolation: Scalable and Strong Consistency for Geo-replicated Transactional Systems , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[4]  Michael J. Freedman,et al.  Stronger Semantics for Low-Latency Geo-Replicated Storage , 2013, NSDI.

[5]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[6]  Ali Ghodsi,et al.  The potential dangers of causal consistency and an explicit solution , 2012, SoCC '12.

[7]  Joseph M. Hellerstein,et al.  Consistency without borders , 2013, SoCC.

[8]  John S. Heidemann,et al.  Implementation of the Ficus Replicated File System , 1990, USENIX Summer.

[9]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[10]  Marcos K. Aguilera,et al.  Consistency-based service level agreements for cloud storage , 2013, SOSP.

[11]  Cheng Li,et al.  Making geo-replicated systems fast as possible, consistent when necessary , 2012, OSDI 2012.

[12]  Rusty Klophaus,et al.  Riak Core: building distributed applications without shared state , 2010, CUFP '10.

[13]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[14]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[15]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[16]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[17]  Idit Keidar,et al.  Trusting the cloud , 2009, SIGA.

[18]  濱野 純 入門Git : The fast version control system , 2009 .

[19]  Amin Vahdat,et al.  Design and evaluation of a conit-based continuous consistency model for replicated services , 2002, TOCS.

[20]  Marcos K. Aguilera,et al.  Transaction chains: achieving serializability with low latency in geo-distributed storage systems , 2013, SOSP.

[21]  Tim Kraska,et al.  MDCC: multi-data center consistency , 2012, EuroSys '13.

[22]  Marvin Theimer,et al.  Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[23]  Marcos K. Aguilera,et al.  Olive: Distributed Point-in-Time Branching Storage for Real Systems , 2006, NSDI.

[24]  Fernando Pedone,et al.  Augustus: scalable and robust storage for cloud applications , 2013, EuroSys '13.

[25]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[26]  Patrick Th. Eugster,et al.  The "art" of programming gossip-based systems , 2007, OPSR.

[27]  Dennis Shasha,et al.  Secure Untrusted Data Repository (SUNDR) , 2004, OSDI.

[28]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[29]  Liuba Shrira,et al.  Exo-Leasing: Escrow Synchronization for Mobile Clients of Commodity Storage Servers , 2008, Middleware.

[30]  Ali Ghodsi,et al.  Bolt-on causal consistency , 2013, SIGMOD '13.

[31]  Amin Vahdat,et al.  Design and evaluation of a continuous consistency model for replicated services , 2000, OSDI.

[32]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[33]  Ariel J. Feldman,et al.  SPORC: Group Collaboration using Untrusted Cloud Resources , 2010, OSDI.

[34]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[35]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[36]  David Mazières,et al.  Replication, history, and grafting in the Ori file system , 2013, SOSP.

[37]  Marc Shapiro,et al.  A comprehensive study of Convergent and Commutative Replicated Data Types , 2011 .

[38]  Chengzheng Sun,et al.  Operational transformation in real-time group editors: issues, algorithms, and achievements , 1998, CSCW '98.

[39]  Srinath T. V. Setty,et al.  Depot: Cloud Storage with Minimal Trust , 2010, TOCS.

[40]  Ethan Katz-Bassett,et al.  SPANStore: cost-effective geo-replicated storage spanning multiple cloud services , 2013, SOSP.

[41]  Philip A. Bernstein,et al.  Categories and Subject Descriptors: H.2.4 [Database Management]: Systems. , 2022 .

[42]  Willy Zwaenepoel,et al.  GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks , 2014, SoCC.

[43]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[44]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[45]  Marc Shapiro,et al.  Conflict-Free Replicated Data Types , 2011, SSS.

[46]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.