Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage

With the explosion of data in applications all around us, erasure coded storage has emerged as an attractive alternative to replication because even with significantly lower storage overhead, they provide better reliability against data loss. Reed-Solomon code is the most widely used erasure code because it provides maximum reliability for a given storage overhead and is flexible in the choice of coding parameters that determine the achievable reliability. However, reconstruction time for unavailable data becomes prohibitively long mainly because of network bottlenecks. Some proposed solutions either use additional storage or limit the coding parameters that can be used. In this paper, we propose a novel distributed reconstruction technique, called Partial Parallel Repair (PPR), which divides the reconstruction operation to small partial operations and schedules them on multiple nodes already involved in the data reconstruction. Then a distributed protocol progressively combines these partial results to reconstruct the unavailable data blocks and this technique reduces the network pressure. Theoretically, our technique can complete the network transfer in ⌈(log2(k + 1))⌉ time, compared to k time needed for a (k, m) Reed-Solomon code. Our experiments show that PPR reduces repair time and degraded read time significantly. Moreover, our technique is compatible with existing erasure codes and does not require any additional storage overhead. We demonstrate this by overlaying PPR on top of two prior schemes, Local Reconstruction Code and Rotated Reed-Solomon code, to gain additional savings in reconstruction time.

[1]  Å. Björck,et al.  Solution of Vandermonde Systems of Equations , 1970 .

[2]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[3]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[4]  H. Michael Ji An optimized processor for fast Reed-Solomon encoding and decoding , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[6]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[7]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[8]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[9]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[10]  Andreas Haeberlen,et al.  Efficient Replica Maintenance for Distributed Storage Systems , 2006, NSDI.

[11]  Lihao Xu,et al.  Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[12]  Dennis Goeckel,et al.  An adaptive Reed-Solomon errors-and-erasures decoder , 2006, FPGA '06.

[13]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[14]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[15]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[16]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[17]  Catherine D. Schuman,et al.  A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage , 2009, FAST.

[18]  Alexandros G. Dimakis,et al.  Rebuilding for array codes in distributed storage systems , 2010, 2010 IEEE Globecom Workshops.

[19]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[20]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[21]  John C. S. Lui,et al.  Optimal recovery of single disk failure in RDP code storage systems , 2010, SIGMETRICS '10.

[22]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[23]  Nihar B. Shah,et al.  Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.

[24]  Yang Tang,et al.  NCCloud: applying network coding for the storage repair in a cloud-of-clouds , 2012, FAST.

[25]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[26]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[27]  Lluis Pamies-Juarez,et al.  CORE: Cross-object redundancy for efficient data repair in storage systems , 2013, 2013 IEEE International Conference on Big Data.

[28]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[29]  Kannan Ramchandran,et al.  A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[30]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[31]  Dimitris S. Papailiopoulos,et al.  Locally Repairable Codes , 2012, IEEE Transactions on Information Theory.

[32]  Lakshmi Ganesh,et al.  Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.

[33]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[34]  Kannan Ramchandran,et al.  A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers , 2014 .

[35]  Martin Schulz,et al.  Accurate application progress analysis for large-scale parallel debugging , 2014, PLDI.

[36]  Bruce M. Maggs,et al.  A Universal Approach to Data Center Network Design , 2014, SPAA.

[37]  Mario Blaum,et al.  A Tale of Two Erasure Codes in HDFS , 2015, FAST.

[38]  Kannan Ramchandran,et al.  Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.

[39]  Hai Liu,et al.  PErasure: A parallel Cauchy Reed-Solomon coding library for GPUs , 2015, 2015 IEEE International Conference on Communications (ICC).

[40]  Baochun Li,et al.  Beehive: Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[41]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[42]  O. Antoine,et al.  Theory of Error-correcting Codes , 2022 .