Erasure Coded Storage on a Changing Network: The Untold Story

As faster storage devices become commercially viable alternatives to disk drives, the network is increasingly becoming the bottleneck in achieving good performance in distributed storage systems. This is especially true for erasure coded storage, where the reconstruction of lost data can significantly encumber the system. Thus, a significant amount of research has focused on reducing the amount of data transferred during this repair process. However, in most cases the network is assumed to have a uniform static structure. One reason behind this is that many of the state of the art codes have a fixed repair mechanism or are constrained in the choice of repair strategies, therefore in theory benefit less from being network aware. We propose a general mechanism that explores the space of possible repairs and examine how much different types of erasure codes benefit by being network aware. We show significant gains for three erasure codes using both theoretical modeling and simulation results. We also consider the practical applicability of our proposed mechanism by limiting the search space to repairs that have the potential to be minimal cost and present a case study for RLNC, a class of flexible codes.

[1]  Kannan Ramchandran,et al.  A Piggybacking Design Framework for Read-and Download-Efficient Distributed Storage Codes , 2017, IEEE Transactions on Information Theory.

[2]  Soroush Akhlaghi,et al.  A Fundamental Trade-Off between the Download Cost and Repair Bandwidth in Distributed Storage Systems , 2010, 2010 IEEE International Symposium on Network Coding (NetCod).

[3]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[4]  Jaume Pujol,et al.  A Realistic Distributed Storage System That Minimizes Data Storage and Repair Bandwidth , 2013, 2013 Data Compression Conference.

[5]  Nihar B. Shah,et al.  A flexible class of regenerating codes for distributed storage , 2010, 2010 IEEE International Symposium on Information Theory.

[6]  Yunnan Wu,et al.  Reducing repair traffic for erasure coding-based storage via interference alignment , 2009, 2009 IEEE International Symposium on Information Theory.

[7]  Kannan Ramchandran,et al.  Distributed Storage Codes With Repair-by-Transfer and Nonachievability of Interior Points on the Storage-Bandwidth Tradeoff , 2010, IEEE Transactions on Information Theory.

[8]  Xin Wang,et al.  Tree-structured Data Regeneration in Distributed Storage Systems with Regenerating Codes , 2010, 2010 Proceedings IEEE INFOCOM.

[9]  Muriel Medard,et al.  How good is random linear coding based distributed networked storage , 2005 .

[10]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[11]  Lakshmi Ganesh,et al.  Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.