Asymmetric regenerating codes for heterogeneous distributed storage systems

Distributed storage systems provide reliability by distributing data over multiple storage nodes. Once a node fails, a new node is introduced to the system to maintain the availability of the stored data. The new node downloads information from other surviving nodes called helper nodes to recover the lost data in the failed node. The number of helper nodes is called repair degree. Compared to traditional approaches, e.g., replication and erasure codes, the regenerating codes proposed recently can significantly reduce the repair bandwidth in homogeneous distributed storage systems. Most existing works focus on uniform settings (e.g., in terms of repair degree and repair bandwidth). However, due to network structures or connectivity limitations, for each failed node, the number of required helper nodes may be different for distinct failed nodes. Furthermore, considering the limits of network traffic of bandwidth, the amount of information allowed to be downloaded from each helper node could also vary. Thus we are motivated to investigate heterogeneous distributed storage systems where the repair degree and the amount of information downloaded from each helper node can be different. In order to obtain the minimal bandwidth to recover a failed node, we construct an information flow graph for such heterogeneous systems. By analyzing the cut-set bound of the information flow graph, the optimal tradeoff between storage capacity and repair bandwidth is derived. We then propose asymmetric regenerating codes that can achieve the curve of the optimal tradeoff. A linear construction of asymmetric regenerating codes is presented. Compared with previous regenerating codes, asymmetric regenerating codes are shown to have a lower repair bandwidth under a certain constraint condition, whose reduction can be up to 36.2%.

[1]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[2]  Kannan Ramchandran,et al.  EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding , 2016, OSDI.

[3]  Xinbing Wang,et al.  Delay and Capacity Tradeoff Analysis for MotionCast , 2011, IEEE/ACM Transactions on Networking.

[4]  Xinbing Wang,et al.  MAP: Multiauctioneer Progressive Auction for Dynamic Spectrum Access , 2011, IEEE Transactions on Mobile Computing.

[5]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[6]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[7]  Kannan Ramchandran,et al.  Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.

[8]  Patrick P. C. Lee,et al.  Erasure coding for small objects in in-memory KV storage , 2017, SYSTOR.

[9]  Soroush Akhlaghi,et al.  Cost-bandwidth tradeoff in distributed storage systems , 2010, Comput. Commun..

[10]  Jian Li,et al.  Optimal Construction of Regenerating Code Through Rate-Matching in Hostile Networks , 2015, IEEE Transactions on Information Theory.

[11]  Kannan Ramchandran,et al.  Explicit construction of optimal exact regenerating codes for distributed storage , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  Nihar B. Shah,et al.  Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.

[13]  Peter Sanders,et al.  Polynomial time algorithms for multicast network code construction , 2005, IEEE Transactions on Information Theory.

[14]  Nihar B. Shah,et al.  A flexible class of regenerating codes for distributed storage , 2010, 2010 IEEE International Symposium on Information Theory.

[15]  Krishna Gopal Benerjee,et al.  Trade-off for Heterogeneous Distributed Storage Systems between Storage and Repair Cost , 2015, Probl. Inf. Transm..

[16]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[17]  A. Dimakis,et al.  Deterministic Regenerating Codes for Distributed Storage Yunnan , 2007 .

[18]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[19]  Xinbing Wang,et al.  Asymptotic Analysis on Throughput and Delay in Cognitive Social Networks , 2014, IEEE Transactions on Communications.