Robust data sharing with key-value stores

A key-value store (KVS) offers functions for storing and retrieving values associated with unique keys. KVSs have become the most popular way to access Internet-scale “cloud” storage systems. We present an efficient wait-free algorithm that emulates multi-reader multi-writer storage from a set of potentially faulty KVS replicas in an asynchronous environment. Our implementation serves an unbounded number of clients that use the storage concurrently. It tolerates crashes of a minority of the KVSs and crashes of any number of clients. Our algorithm minimizes the space overhead at the KVSs and comes in two variants providing regular and atomic semantics, respectively. Compared with prior solutions, it is inherently scalable and allows clients to write concurrently. Because of the limited interface of a KVS, textbook-style solutions for reliable storage either do not work or incur a prohibitively large storage overhead. Our algorithm maintains two copies of the stored value per KVS in the common case, and we show that this is indeed necessary. If there are concurrent write operations, the maximum space complexity of the algorithm grows in proportion to the point contention. A series of simulations explore the behavior of the algorithm, and benchmarks obtained with KVS cloud-storage providers demonstrate its practicality.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[3]  Baruch Awerbuch,et al.  Atomic Shared Register Access by Asynchronous Hardware (Detailed Abstract) , 1986, FOCS 1986.

[4]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[5]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[6]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[7]  D. Dolev,et al.  Sharing memory robustly in message-passing systems , 1995, JACM.

[8]  Nancy A. Lynch,et al.  Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[9]  Sam Toueg,et al.  Fault-tolerant wait-free shared objects , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[10]  Hagit Attiya,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 1998 .

[11]  Alexander A. Shvartsman,et al.  Graceful quorum reconfiguration in a robust emulation of shared memory , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[12]  Dahlia Malkhi,et al.  Active disk paxos with infinitely many processes , 2002, PODC.

[13]  Alex A. Shvartsmanz Rambo: A Reconfigurable Atomic Memory Service for Dynamic Networks , 2002 .

[14]  Jennifer L. Welch,et al.  Multi-writer Consistency Conditions for Shared Memory Objects , 2003, DISC.

[15]  Nancy A. Lynch,et al.  Efficient Replication of Large Data Objects , 2003, DISC.

[16]  Leslie Lamport,et al.  Disk Paxos , 2003, Distributed Computing.

[17]  Idit Keidar,et al.  Byzantine disk paxos: optimal resilience with byzantine shared memory , 2004, PODC.

[18]  Leslie Lamport,et al.  On interprocess communication , 1986, Distributed Computing.

[19]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[20]  Werner Vogels,et al.  Eventually consistent , 2008, CACM.

[21]  Lidong Zhou,et al.  Niobe: A practical replication protocol , 2008, TOS.

[22]  Marcos K. Aguilera,et al.  Dynamic atomic storage without consensus , 2009, PODC '09.

[23]  Chryssis Georgiou,et al.  Fault-tolerant semifast implementations of atomic read/write registers , 2009, J. Parallel Distributed Comput..

[24]  Hakim Weatherspoon,et al.  RACS: a case for cloud storage diversity , 2010, SoCC '10.

[25]  Amit A. Levy,et al.  Comet: An active distributed key-value store , 2010, OSDI.

[26]  Farokh B. Bastani,et al.  Secure, Dependable, and High Performance Cloud Storage , 2010, 2010 29th IEEE Symposium on Reliable Distributed Systems.

[27]  Xiaozhou Li,et al.  Efficient eventual consistency in Pahoehoe, an erasure-coded key-blob archive , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[28]  Marko Vukolic,et al.  Fast Access to Distributed Atomic Memory , 2010, SIAM J. Comput..

[29]  Nancy A. Lynch,et al.  Rambo: a robust, reconfigurable atomic memory service for dynamic networks , 2010, Distributed Computing.

[30]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[31]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[32]  Alysson Neves Bessani,et al.  DepSky: dependable and secure storage in a cloud-of-clouds , 2011, EuroSys '11.

[33]  James S. Plank,et al.  AONT-RS: Blending Security and Performance in Dispersed Storage Systems , 2011, FAST.

[34]  Rachid Guerraoui,et al.  Introduction to Reliable and Secure Distributed Programming , 2011 .

[35]  Marko Vukolic,et al.  Robust data sharing with key-value stores , 2012, DSN.

[36]  Lee Chao,et al.  Windows Azure Storage , 2013 .