Tree-structured parallel regeneration for multiple data losses in distributed storage systems based on erasure codes

To reduce the time required to complete the regeneration process of erasure codes, we propose a Tree-structured Parallel Regeneration (TPR) scheme for multiple data losses in distributed storage systems. Under the scheme, two algorithms are proposed for the construction of multiple regeneration trees, namely the edge-disjoint algorithm and edge-sharing algorithm. The edge-disjoint algorithm constructs multiple independent trees, and is simple and appropriate for environments where newcomers and their providers are distributed over a large area and have few intersections. The edge-sharing algorithm constructs multiple trees that compete to utilize the bandwidth, and make a better utilization of the bandwidth, although it needs to measure the available bandwidth and deal with the bandwidth changes; it is therefore difficult to implement in practical systems. The parallel regeneration for multiple data losses of TPR primarily includes two optimizations: firstly, transferring the data through the bandwidth optimized-paths in a pipeline manner; secondly, executing data regeneration over multiple trees in parallel. To evaluate the proposal, we implement an event-based simulator and make a detailed comparison with some popular regeneration methods. The quantitative comparison results show that the use of TPR employing either the edge-disjoint algorithm or edge-sharing algorithm reduces the regeneration time significantly.