Coupling Decentralized Key-Value Stores with Erasure Coding

Modern decentralized key-value stores often replicate and distribute data via consistent hashing for availability and scalability. Compared to replication, erasure coding is a promising redundancy approach that provides availability guarantees at much lower cost. However, when combined with consistent hashing, erasure coding incurs a lot of parity updates during scaling (i.e., adding or removing nodes) and cannot efficiently handle degraded reads caused by scaling. In this paper, we propose a novel erasure coding model called FragEC, which incurs no parity updates during scaling. We further extend consistent hashing with multiple hash rings to enable erasure coding to seamlessly address degraded reads during scaling. We realize our design as an in-memory key-value store called ECHash, and conduct testbed experiments on different scaling workloads in both local and cloud environments. We show that ECHash achieves better scaling performance (in terms of scaling throughput and degraded read latency during scaling) over the baseline erasure coding implementation, while maintaining high basic I/O and node repair performance.

[1]  Patrick P. C. Lee,et al.  Parity logging with reserved space: towards efficient updates and recovery in erasure-coded clustered storage , 2014, FAST.

[2]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[3]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[4]  Gustavo Alonso,et al.  Fast and strongly-consistent per-item resilience in key-value stores , 2018, EuroSys.

[5]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[6]  Michael Vrable,et al.  BlueSky: a cloud-backed file system for the enterprise , 2012, FAST.

[7]  Patrick P. C. Lee,et al.  Repair Pipelining for Erasure-Coded Storage , 2017, USENIX Annual Technical Conference.

[8]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[9]  Xiaozhou Li,et al.  Algorithmic improvements for fast concurrent Cuckoo hashing , 2014, EuroSys '14.

[10]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[11]  Bin Fan,et al.  MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing , 2013, NSDI.

[12]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[13]  Mithuna Thottethodi,et al.  Understanding and mitigating the impact of load imbalance in the memory caching tier , 2013, SoCC.

[14]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15]  Cheng Huang,et al.  Latency reduction and load balancing in coded storage systems , 2017, SoCC.

[16]  Patrick P. C. Lee,et al.  Erasure coding for small objects in in-memory KV storage , 2017, SYSTOR.

[17]  Cheng Huang,et al.  Giza: Erasure Coding Objects across Global Data Centers , 2017, USENIX Annual Technical Conference.

[18]  Xiao Qin,et al.  Scale-RS: An Efficient Scaling Scheme for RS-Coded Storage Clusters , 2015, IEEE Transactions on Parallel and Distributed Systems.

[19]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[20]  Erez Zadok,et al.  Kurma: secure geo-distributed multi-cloud storage gateways , 2019, SYSTOR.

[21]  Witold Litwin,et al.  LH*RS---a highly-available scalable distributed data structure , 2005, TODS.

[22]  Pan Zhou,et al.  Toward Optimal Storage Scaling via Network Coding: From Theory to Practice , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[23]  Si Wu,et al.  I/O-Efficient Scaling Schemes for Distributed Storage Systems with CRS Codes , 2016, IEEE Transactions on Parallel and Distributed Systems.

[24]  Jason Cong,et al.  Atlas: Baidu's key-value storage system for cloud data , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[25]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[26]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[27]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[28]  Yingwei Luo,et al.  LAMA: Optimized Locality-aware Memory Allocation for Key-value Cache , 2015, USENIX Annual Technical Conference.

[29]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[30]  Anshul Gandhi,et al.  ElMem: Towards an Elastic Memcached System , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[31]  Lakshmi Ganesh,et al.  Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.

[32]  Kannan Ramchandran,et al.  EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding , 2016, OSDI.

[33]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[34]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[35]  Hjörtur Björnsson,et al.  Dynamic performance profiling of cloud caches , 2013, SoCC.

[36]  Heng Zhang,et al.  Efficient and Available In-Memory KV-Store with Hybrid Erasure Coding and Replication , 2016, FAST.

[37]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[38]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[39]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.