Centralized multi-node repair in distributed storage

In distributed storage systems, multiple storage node failures are frequent and efficiently recovering them is crucial for high system performance. In this work, we consider the problem of repairing multiple failures in a centralized way, which can be desirable in many data storage configurations. We first establish the tradeoff between the repair bandwidth and the storage size for functional repair. Using a graph-theoretic approach, the optimal tradeoff is identified as the solution to an integer optimization problem, for which we derive a closed-form expression. When the number of erasures e satisfies e ≥ k, k being the minimum number of nodes needed to reconstruct the entire data, the tradeoff reduces to a single point, for which we provide an explicit code construction. Expressions of the extreme points, namely the minimum storage multi-node repair (MSMR) and minimum bandwidth multi-node repair (MBMR) points, are also derived. Furthermore, we prove that functional MBMR point is not achievable for linear exact repair codes. Finally, for e | k and e | d, where d is the number of helper nodes during repair, we show that the functional repair tradeoff is not achievable under exact repair, except for maybe a small portion near the MSMR point, which parallels the results for single erasure repair by Shah et al.

[1]  Arman Fazeli,et al.  Minimum Storage Regenerating Codes for All Parameters , 2017, IEEE Transactions on Information Theory.

[2]  Anne-Marie Kermarrec,et al.  Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes , 2011, 2011 International Symposium on Networking Coding.

[3]  Kannan Ramchandran,et al.  Exact Regenerating Codes for Distributed Storage , 2009, ArXiv.

[4]  Kannan Ramchandran,et al.  Asymptotic Interference Alignment for Optimal Repair of MDS Codes in Distributed Storage , 2013, IEEE Transactions on Information Theory.

[5]  Sriram Vishwanath,et al.  Progress on high-rate MSR codes: Enabling arbitrary number of helper nodes , 2016, 2016 Information Theory and Applications Workshop (ITA).

[6]  Mehran Elyasi,et al.  Linear exact repair rate region of (k + 1, k, k) distributed storage systems: A new approach , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[7]  Alexander Barg,et al.  Explicit Constructions of High-Rate MDS Array Codes With Optimal Repair Bandwidth , 2016, IEEE Transactions on Information Theory.

[8]  Kenneth W. Shum,et al.  Cooperative Regenerating Codes , 2012, IEEE Transactions on Information Theory.

[9]  A. Dimakis,et al.  Deterministic Regenerating Codes for Distributed Storage Yunnan , 2007 .

[10]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[11]  Jehoshua Bruck,et al.  Explicit MDS Codes for Optimal Repair Bandwidth , 2014, ArXiv.

[12]  GhemawatSanjay,et al.  The Google file system , 2003 .

[13]  Kannan Ramchandran,et al.  Distributed Storage Codes With Repair-by-Transfer and Nonachievability of Interior Points on the Storage-Bandwidth Tradeoff , 2010, IEEE Transactions on Information Theory.

[14]  Jehoshua Bruck,et al.  Optimal Rebuilding of Multiple Erasures in MDS Codes , 2016, IEEE Transactions on Information Theory.

[15]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[16]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[17]  Kannan Ramchandran,et al.  Interference Alignment in Regenerating Codes for Distributed Storage: Necessity and Code Constructions , 2010, IEEE Transactions on Information Theory.

[18]  Tracey Ho,et al.  A Random Linear Network Coding Approach to Multicast , 2006, IEEE Transactions on Information Theory.

[19]  Jehoshua Bruck,et al.  Zigzag Codes: MDS Array Codes With Optimal Rebuilding , 2011, IEEE Transactions on Information Theory.

[20]  Baochun Li,et al.  Cooperative repair with minimum-storage regenerating codes for distributed storage , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[21]  Chi Wan Sung,et al.  Broadcast repair for wireless distributed storage systems , 2015, 2015 10th International Conference on Information, Communications and Signal Processing (ICICS).

[22]  Dimitris S. Papailiopoulos,et al.  Repair Optimal Erasure Codes Through Hadamard Designs , 2011, IEEE Transactions on Information Theory.

[23]  Cheng Huang,et al.  Optimal Repair of MDS Codes in Distributed Storage via Subspace Interference Alignment , 2011, ArXiv.

[24]  Yunnan Wu,et al.  Reducing repair traffic for erasure coding-based storage via interference alignment , 2009, 2009 IEEE International Symposium on Information Theory.

[25]  Zhifang Zhang,et al.  Exact cooperative regenerating codes with minimum-repair-bandwidth for distributed storage , 2013, 2013 Proceedings IEEE INFOCOM.

[26]  Sriram Vishwanath,et al.  Centralized Repair of Multiple Node Failures With Applications to Communication Efficient Secret Sharing , 2016, IEEE Transactions on Information Theory.

[27]  Kannan Ramchandran,et al.  Exact-Repair MDS Code Construction Using Interference Alignment , 2011, IEEE Transactions on Information Theory.

[28]  Wang Zhiying,et al.  Centralized multi-node repair in distributed storage , 2016 .