RE-Store: Reliable and Efficient KV-Store with Erasure Coding and Replication

In-memory key/value stores (KV-stores) are a key building block for numerous applications running on a cluster. As cluster scales have grown, efficiency and availability have become increasingly critical characteristics. Traditional replication provides redundancy, but is inefficient due to its high storage overhead. Erasure coding can provide data reliability with significantly lower storage requirements, but is primarily used for long-term archival data due to the limitation of its write performance. Recent studies have attempted to combine these two techniques by using replication for frequently-updated metadata, and erasure coding for large, read-only data. In this study, we propose RE-Store, an in-memory key/value store system which utilizes a novel hybrid replication/erasure coding scheme to achieve both efficiency and reliability. RE-Store introduces replication into erasure coding by making one copy of each encoded datum and replacing partial parity with replicas for improved storage-efficiency. When failures occur, it uses these replicas to ensure data availability and thus avoids the inefficiencies of erasure coding during repair. RE-Store provides fault tolerance through fast, online recovery during different failure scenarios with little performance degradation. We have implemented RE-Store on a real key/value system and conducted extensive evaluations to validate its design and to study its performance, efficiency, and reliability. Experimental results show that RE-Store performs similarly to erasure coding and replication under normal operations while saving 18% to 34% of the memory used by replication when tolerating 2 to 4 failures.

[1]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[2]  Jiang Zhou,et al.  Attributed Consistent Hashing for Heterogeneous Storage System , 2016 .

[3]  Peter Braam,et al.  The Lustre Storage Architecture , 2019, ArXiv.

[4]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[5]  Dhabaleswar K. Panda,et al.  High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[6]  Alex C. Snoeren,et al.  Passive Realtime Datacenter Fault Detection and Localization , 2017, NSDI.

[7]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[8]  Hiroaki Muraoka,et al.  REC2: Restoration Method Using Combination of Replication and Erasure Coding , 2016, 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI).

[9]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[10]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.

[11]  Jun Wang,et al.  A new reliability model in replication-based big data storage systems , 2017, J. Parallel Distributed Comput..

[12]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[13]  Jian Yang,et al.  Mojim: A Reliable and Highly-Available Non-Volatile Memory System , 2015, ASPLOS.

[14]  GhemawatSanjay,et al.  The Google file system , 2003 .

[15]  Patrick P. C. Lee,et al.  Cross-Rack-Aware Updates in Erasure-Coded Data Centers , 2018, ICPP.

[16]  Roy Friedman,et al.  Replicated erasure codes for storage and repair-traffic efficiency , 2014, 14-th IEEE International Conference on Peer-to-Peer Computing.

[17]  Gustavo Alonso,et al.  Fast and strongly-consistent per-item resilience in key-value stores , 2018, EuroSys.

[18]  Xin Wang,et al.  Tree-structured Data Regeneration in Distributed Storage Systems with Regenerating Codes , 2010, 2010 Proceedings IEEE INFOCOM.

[19]  Garth A. Gibson,et al.  DiskReduce : Replication as a Prelude to Erasure Coding in Data-Intensive Scalable Computing , 2011 .

[20]  Bran Selic,et al.  A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems , 2013, The Journal of Supercomputing.

[21]  Kannan Ramchandran,et al.  Information-Theoretically Secure Erasure Codes for Distributed Storage , 2015, IEEE Transactions on Information Theory.

[22]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[23]  Kannan Ramchandran,et al.  A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[24]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[25]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[26]  Janet L. Wiener,et al.  Fast database restarts at facebook , 2014, SIGMOD Conference.

[27]  Wei Xie,et al.  Pattern-Directed Replication Scheme for Heterogeneous Object-Based Storage , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[28]  Khuzaima Daudjee,et al.  EC-Store: Bridging the Gap between Storage and Latency in Distributed Erasure Coded Systems , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[29]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[30]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[31]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[32]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[33]  Fang Wang,et al.  Non-Sequential Striping for Distributed Storage Systems with Different Redundancy Schemes , 2017, 2017 46th International Conference on Parallel Processing (ICPP).

[34]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[35]  Kannan Ramchandran,et al.  Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.

[36]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[37]  Lakshmi Ganesh,et al.  Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.

[38]  Heng Zhang,et al.  Efficient and Available In-Memory KV-Store with Hybrid Erasure Coding and Replication , 2016, FAST.