Repair rate lower bounds for distributed storage

One of the primary objectives of a distributed storage system is to reliably store a large amount $dsize$ of source data for a long duration using a large number $N$ of unreliable storage nodes, each with capacity $nsize$. The storage overhead $\beta$ is the fraction of system capacity available beyond $dsize$, i.e., $\beta = 1- \frac{dsize}{N \cdot nsize}$. Storage nodes fail randomly over time and are replaced with initially empty nodes, and thus data is erased from the system at an average rate $erate = \lambda \cdot N \cdot nsize$, where $1/\lambda$ is the average lifetime of a node before failure. To maintain recoverability of the source data, a repairer continually reads data over a network from nodes at some average rate $rrate$, and generates and writes data to nodes based on the read data. The main result is that, for any repairer, if the source data is recoverable at each point in time then it must be the case that $rrate \ge \frac{erate}{2 \cdot \beta}$ asymptotically as $N$ goes to infinity and beta goes to zero. This inequality provides a fundamental lower bound on the average rate that any repairer needs to read data from the system in order to maintain recoverability of the source data.

[1]  Thomas Stockhammer,et al.  RaptorQ Forward Error Correction Scheme for Object Delivery , 2011, RFC.

[2]  Roberto Padovani,et al.  Liquid Cloud Storage , 2017, ACM Trans. Storage.

[3]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[4]  Amin Shokrollahi,et al.  Raptor codes , 2011, IEEE Transactions on Information Theory.

[5]  Marek Karpinski,et al.  An XOR-based erasure-resilient coding scheme , 1995 .

[6]  Thomas Richardson,et al.  Distributed storage algorithms with optimal tradeoffs , 2021, ArXiv.

[7]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[8]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[9]  Vincent Roca,et al.  Reed-Solomon Forward Error Correction (FEC) Schemes , 2009, RFC.

[10]  S. Janson Tail bounds for sums of geometric and exponential variables , 2017, 1709.08157.

[11]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[12]  Luigi Rizzo,et al.  Effective erasure codes for reliable computer communication protocols , 1997, CCRV.

[13]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[14]  Cheng Huang,et al.  On the Locality of Codeword Symbols , 2011, IEEE Transactions on Information Theory.

[15]  Shubhangi Saraf,et al.  Maximally Recoverable Codes for Grid-like Topologies , 2016, SODA.