Maximum Distance Separable Array Codes Allowing Partial Collaboration

This letter considers the problem of repairing multiple node failures through partial collaboration in a distributed storage system (DSS). In the repair process, each failed node firstly connects to <inline-formula> <tex-math notation="LaTeX">$d\geq k$ </tex-math></inline-formula> alive nodes to download data, and then exchanges data with other repairing nodes. Partial collaboration allows each failed node to only connect to some (not all) of the other repairing nodes to exchange data. Constructions of partially collaborative regenerating codes with <inline-formula> <tex-math notation="LaTeX">$d=k$ </tex-math></inline-formula> at the minimum-storage regime have been studied before.We propose a code construction using the maximum distance separable (MDS) array codes to achieve <inline-formula> <tex-math notation="LaTeX">$d>k$ </tex-math></inline-formula>, and show that the constructed code asymptotically approaches the minimum storage repair point as the number of failed nodes grows.

[1]  Anne-Marie Kermarrec,et al.  Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes , 2011, 2011 International Symposium on Networking Coding.

[2]  Pei Li,et al.  Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding , 2010, IEEE Journal on Selected Areas in Communications.

[3]  Alexander Barg,et al.  Cooperative Repair: Constructions of Optimal MDS Codes for All Admissible Parameters , 2018, IEEE Transactions on Information Theory.

[4]  Deniz Gündüz,et al.  Storage-Repair Bandwidth Trade-off for Wireless Caching with Partial Failure and Broadcast Repair , 2018, 2018 IEEE Information Theory Workshop (ITW).

[5]  Arman Fazeli,et al.  Minimum storage regenerating codes for all parameters , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[6]  GhemawatSanjay,et al.  The Google file system , 2003 .

[7]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[8]  Frédérique E. Oggier,et al.  On applications of orbit codes to storage , 2016, Adv. Math. Commun..

[9]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[10]  Frédérique E. Oggier,et al.  On storage codes allowing partially collaborative repairs , 2014, 2014 IEEE International Symposium on Information Theory.

[11]  Kenneth W. Shum,et al.  Cooperative Regenerating Codes , 2012, IEEE Transactions on Information Theory.

[12]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[13]  Frédérique E. Oggier,et al.  Two storage code constructions allowing partially collaborative repairs , 2014, 2014 International Symposium on Information Theory and its Applications.