A new piggybacking design for systematic MDS storage codes

Distributed storage codes have important applications in the design of modern storage systems. In a distributed storage system, every storage node has a probability to fail and once an individual storage node fails, it must be reconstructed using the data stored in the surviving nodes. Computation load and network bandwidth are two important issues we need to concern when repairing a failed node. Generally speaking, the naive maximum distance separable (MDS) storage codes have low repair complexity but high repair bandwidth. On the contrary, minimum storage regenerating codes have low repair bandwidth but high repair complexity. Fortunately, the newly introduced piggybacked codes combine the advantages of both ones. The main result of this paper is a novel piggybacking design framework for $$(k+r,k)$$(k+r,k) systematic MDS storage codes, where k, r denote the number of systematic nodes and the number of parity nodes, respectively. In the new code, the average repair bandwidth rate for the systematic nodes, i.e., the ratio of the average repair bandwidth of a single failed systematic node and the amount of the original data, can be as low as $$\sqrt{\frac{2}{r}}+\frac{1}{2r}+\frac{3}{k}+\frac{\sqrt{2r}}{k^2}$$2r+12r+3k+2rk2, which is roughly $$\sqrt{\frac{2}{r}}+\frac{1}{2r}$$2r+12r when the code has high rate $$k\gg r$$k≫r. For relatively large r (e.g., $$r\ge 6$$r≥6), this result significantly improves the previously known one which has average repair bandwidth rate roughly $$\frac{r-1}{2r-1}$$r-12r-1. In the meanwhile, every failed systematic node of the new code can be reconstructed quickly using the decoding algorithm of a classical MDS code, only with some additional additions over the underlying finite field.

[1]  Jie Li,et al.  A Systematic Piggybacking Design for Minimum Storage Regenerating Codes , 2014, IEEE Transactions on Information Theory.

[2]  P. Vijay Kumar,et al.  An Explicit, Coupled-Layer Construction of a High-Rate MSR Code with Low Sub-Packetization Level, Small Field Size and All-Node Repair , 2016, ArXiv.

[3]  Jehoshua Bruck,et al.  Zigzag Codes: MDS Array Codes With Optimal Rebuilding , 2011, IEEE Transactions on Information Theory.

[4]  Jehoshua Bruck,et al.  Long MDS codes for optimal repair bandwidth , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[5]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[6]  Kannan Ramchandran,et al.  A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[7]  Kannan Ramchandran,et al.  A "hitchhiker's" guide to fast and efficient data reconstruction in erasure-coded data centers , 2015, SIGCOMM 2015.

[8]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[9]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[10]  Alexandre Graell i Amat,et al.  A Family of Erasure Correcting Codes with Low Repair Bandwidth and Low Repair Complexity , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[11]  Kannan Ramchandran,et al.  A Piggybacking Design Framework for Read-and Download-Efficient Distributed Storage Codes , 2017, IEEE Transactions on Information Theory.

[12]  Itzhak Tamo,et al.  A Family of Optimal Locally Recoverable Codes , 2013, IEEE Transactions on Information Theory.

[13]  Kannan Ramchandran,et al.  A piggybacking design framework for read-and download-efficient distributed storage codes , 2013, ISIT.

[14]  P. Vijay Kumar,et al.  An Explicit, Coupled-Layer Construction of a High-Rate MSR Code with Low Sub-Packetization Level, Small Field Size and All-Node Repair , 2016, ArXiv.

[15]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[16]  Arman Fazeli,et al.  Minimum Storage Regenerating Codes for All Parameters , 2017, IEEE Transactions on Information Theory.

[17]  Richard C. Singleton,et al.  Maximum distance q -nary codes , 1964, IEEE Trans. Inf. Theory.

[18]  Alexander Barg,et al.  Explicit Constructions of High-Rate MDS Array Codes With Optimal Repair Bandwidth , 2016, IEEE Transactions on Information Theory.

[19]  Cheng Huang,et al.  On the Locality of Codeword Symbols , 2011, IEEE Transactions on Information Theory.

[20]  A. Robert Calderbank,et al.  An Improved Sub-Packetization Bound for Minimum Storage Regenerating Codes , 2013, IEEE Transactions on Information Theory.

[21]  Sriram Vishwanath,et al.  Progress on high-rate MSR codes: Enabling arbitrary number of helper nodes , 2016, 2016 Information Theory and Applications Workshop (ITA).