Fast and strongly-consistent per-item resilience in key-value stores

In-memory key-value stores (KVSs) provide different forms of resilience through basic r-way replication and complex erasure codes such as Reed-Solomon. Each storage scheme exhibits different tradeoffs in terms of reliability and resources used (memory, network load, latency, storage required, etc.). Unfortunately, most KVSs support only a single such storage scheme, forcing designers to employ different KVSs for different applications. To address this problem, we have designed a strongly consistent in-memory KVS, Ring, that empowers its users to set the level of resilience on a KV pair basis while still maintaining overall consistency and without compromising efficiency. At the heart of Ring lies a novel encoding scheme, Stretched Reed-Solomon coding, that combines hash key distributions of heterogeneous replication and erasure coding schemes. Ring utilizes RDMA to ensure low latencies and offload communication tasks. Its latency, bandwidth, and throughput are comparable to state-of-the-art systems that do not support changing resilience and, thus, have much higher memory overheads. We show use cases that demonstrate significant memory savings and discuss trade-offs between reliability, performance, and cost. Our work demonstrates how future applications that consciously manage resilience of KV pairs can reduce the overall operational cost and significantly improve the performance of KVS deployments.

[1]  Cleve B. Moler,et al.  Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later , 1978, SIAM Rev..

[2]  Amar Phanishayee,et al.  Flex-KV: enabling high-performance and flexible KV systems , 2012 .

[3]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[4]  J. Rupe Reliability of Computer Systems and Networks Fault Tolerance, Analysis, and Design , 2003 .

[5]  James P. Braselton,et al.  Differential Equations With Maple V , 1994 .

[6]  GhemawatSanjay,et al.  The Google file system , 2003 .

[7]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[8]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[9]  Marvin Rausand,et al.  System Reliability Theory: Models, Statistical Methods, and Applications , 2003 .

[10]  Ion Stoica,et al.  BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores , 2016, NSDI.

[11]  Peng Li,et al.  Paxos Replicated State Machines as the Basis of a High-Performance Data Store , 2011, NSDI.

[12]  Renato Recio,et al.  An RDMA Protocol Specification , 2002 .

[13]  Saurabh Bagchi,et al.  Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage , 2016, EuroSys.

[14]  Heng Zhang,et al.  Efficient and Available In-Memory KV-Store with Hybrid Erasure Coding and Replication , 2016, FAST.

[15]  Ashish Gupta,et al.  The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..

[16]  Cheng Li,et al.  Making geo-replicated systems fast as possible, consistent when necessary , 2012, OSDI 2012.

[17]  Mario Blaum,et al.  A Tale of Two Erasure Codes in HDFS , 2015, FAST.

[18]  Vipin Chaudhary,et al.  Cider: A Case for Block Level Variable Redundancy on a Distributed Flash Array , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[19]  James S. Plank A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .

[20]  Gustavo Alonso,et al.  Database replication techniques: a three parameter classification , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[21]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[22]  Gustavo Alonso,et al.  Are quorums an alternative for data replication? , 2003, TODS.

[23]  Gustavo Alonso,et al.  Consistency Rationing in the Cloud: Pay only when it matters , 2009, Proc. VLDB Endow..

[24]  Torsten Hoefler,et al.  DARE: High-Performance State Machine Replication on RDMA Networks , 2015, HPDC.

[25]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[26]  Marcos K. Aguilera,et al.  Consistency-based service level agreements for cloud storage , 2013, SOSP.

[27]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[28]  L. Dickson Linear Groups, with an Exposition of the Galois Field Theory , 1958 .

[29]  Stefan Jablonski,et al.  NoSQL evaluation: A use case oriented survey , 2011, 2011 International Conference on Cloud and Service Computing.

[30]  Eric R. Ziegel,et al.  System Reliability Theory: Models, Statistical Methods, and Applications , 2004, Technometrics.

[31]  Fred B. Schneider,et al.  Byzantine generals in action: implementing fail-stop processors , 1984, TOCS.