Warp: Lightweight Multi-Key Transactions for Key-Value Stores

Traditional NoSQL systems scale by sharding data across multiple servers and by performing each operation on a small number of servers. Because transactions on multiple keys necessarily require coordination across multiple servers, NoSQL systems often explicitly avoid making transactional guarantees in order to avoid such coordination. Past work on transactional systems control this coordination by either increasing the granularity at which transactions are ordered, sacrificing serializability, or by making clock synchronicity assumptions. This paper presents a novel protocol for providing serializable transactions on top of a sharded data store. Called acyclic transactions, this protocol allows multiple transactions to prepare and commit simultaneously, improving concurrency in the system, while ensuring that no cycles form between concurrently-committing transactions. We have fully implemented acyclic transactions in a document store called Warp. Experiments show that Warp achieves 4 times higher throughput than Sinfonia's mini-transactions on the standard TPC-C benchmark with no aborts. Further, the system achieves 75% of the throughput of the non-transactional key-value store it builds upon.

[1]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[2]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[3]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[4]  Eric Brewer,et al.  A design framework and a scalable storage platform to simplify internet service construction , 2000 .

[5]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[6]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[7]  Divyakant Agrawal,et al.  G-Store: a scalable data store for transactional multi key access in the cloud , 2010, SoCC '10.

[8]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[9]  Divyakant Agrawal,et al.  ElasTraS: An elastic, scalable, and self-managing transactional database for the cloud , 2013, TODS.

[10]  Ali Ghodsi,et al.  Coordination-Avoiding Database Systems , 2014, ArXiv.

[11]  Fernando Pedone,et al.  Geo-replicated storage with scalable deferred update replication , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[12]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[13]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.

[14]  Barbara Liskov,et al.  Granola: Low-Overhead Distributed Transaction Coordination , 2012, USENIX Annual Technical Conference.

[15]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[16]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[17]  Benjamin Reed,et al.  Omid: Lock-free transactional support for distributed data stores , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[18]  Yang Zhang,et al.  Extracting More Concurrency from Distributed Transactions , 2014, OSDI.

[19]  Theo Härder,et al.  Observations on optimistic concurrency control schemes , 1984, Inf. Syst..

[20]  Marcos K. Aguilera,et al.  Transaction chains: achieving serializability with low latency in geo-distributed storage systems , 2013, SOSP.

[21]  Tim Kraska,et al.  MDCC: multi-data center consistency , 2012, EuroSys '13.

[22]  Leslie Lamport,et al.  Consensus on transaction commit , 2004, TODS.

[23]  Divyakant Agrawal,et al.  Minimizing Commit Latency of Transactions in Geo-Replicated Data Stores , 2015, SIGMOD Conference.

[24]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[25]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[26]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX Annual Technical Conference.

[27]  Ivan Beschastnikh,et al.  Scalable consistency in Scatter , 2011, SOSP.

[28]  Marc Shapiro,et al.  CRDTs: Consistency without concurrency control , 2009, ArXiv.

[29]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[30]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[31]  Ben Y. Zhao,et al.  Tapestry: a fault-tolerant wide-area application infrastructure , 2002, CCRV.

[32]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[33]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[34]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[35]  Sudipta Sengupta,et al.  High Performance Transactions in Deuteronomy , 2015, CIDR.

[36]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[37]  S.Suganthi,et al.  Cassandra-A Decentralized Structured Storage System , 2017 .

[38]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[39]  Amit A. Levy,et al.  Comet: An active distributed key-value store , 2010, OSDI.

[40]  Thki Hder,et al.  OBSERVATIONS ON OPTIMISTIC CONCURRENCY CONTROL SCHEMES , 2003 .

[41]  Robert Gruber,et al.  Efficient optimistic concurrency control using loosely synchronized clocks , 1995, SIGMOD '95.

[42]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[43]  Donald D. Chamberlin,et al.  A History of System R and SQL/Data System (Invited Paper) , 1981, VLDB.

[44]  Michael J. Freedman,et al.  Stronger Semantics for Low-Latency Geo-Replicated Storage , 2013, NSDI.

[45]  Florian Schintke,et al.  Scalaris: reliable transactional p2p key/value store , 2008, ERLANG '08.

[46]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[47]  Emin Gün Sirer,et al.  HyperDex: a distributed, searchable key-value store , 2012, SIGCOMM '12.

[48]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[49]  Fernando Pedone,et al.  Scalable deferred update replication , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[50]  Gerhard Weikum,et al.  Unbundling Transaction Services in the Cloud , 2009, CIDR.

[51]  LamportLeslie Time, clocks, and the ordering of events in a distributed system , 1978 .

[52]  Mohamed F. Mokbel,et al.  Deuteronomy: Transaction Support for Cloud Data , 2011, CIDR.

[53]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[54]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[55]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[56]  Emin Gün Sirer,et al.  Commodifying Replicated State Machines with OpenReplica , 2012 .