Topology-Aware Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems

Distributed storage systems introduce redundancy to protect data from node failures. After a storage node fails, the lost data should be regenerated at a replacement storage node as soon as possible to maintain the same level of redundancy. Minimizing such a regeneration time is critical to the reliability of distributed storage systems. Existing work commits to reduce the regeneration time by either minimizing the regenerating traffic, or adjusting the regenerating traffic patterns, whereas nodes participating data regeneration are generally assumed to be given beforehand. However, such regeneration time also depends heavily on the selection of the participating nodes. Selecting different participating nodes actually involve different data links between the nodes. Real-world distributed storage systems usually exhibit heterogeneous link capacities. It is possible to further reduce the regeneration time via exploiting such link capacity differences and avoiding the link bottlenecks. In this paper, we consider the minimization of the regeneration time by selecting the participating nodes in heterogeneous networks. We analyze the regeneration time and propose node selection algorithms for overlay networks and real-world topologies. Considering that the flexible amount of data blocks from each provider may deeply influence the regeneration time, several techniques are designed to enhance our schemes in overlay networks. Experimental results show that our node selection schemes can significantly reduce the regeneration time for each topology, especially in practical networks with heterogeneous link capacities.

[1]  Garth A. Gibson,et al.  DiskReduce: RAID for data-intensive scalable computing , 2009, PDSW '09.

[2]  Sally McClean,et al.  Drop Tail and Red Queue Management with Small Buffers: Stability and HOPF Bifurcation , 2011 .

[3]  Xin Wang,et al.  Tree-structured Data Regeneration in Distributed Storage Systems with Regenerating Codes , 2010, 2010 Proceedings IEEE INFOCOM.

[4]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[5]  Kannan Ramchandran,et al.  Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.

[6]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[7]  Syed Ali Jafar,et al.  Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient , 2010, ArXiv.

[8]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[9]  Jaume Pujol,et al.  A realistic distributed storage system: the rack model , 2013, ArXiv.

[10]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[11]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[12]  Kannan Ramchandran,et al.  A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[13]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[14]  Xin Wang,et al.  Heterogeneity-aware data regeneration in distributed storage systems , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[15]  Sujata Banerjee,et al.  Measuring Bandwidth Between PlanetLab Nodes , 2005, PAM.

[16]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[17]  Xin Wang,et al.  Optimal Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems , 2015, 2015 44th International Conference on Parallel Processing.

[18]  Ming Zhang,et al.  Understanding data center traffic characteristics , 2010, CCRV.

[19]  GhemawatSanjay,et al.  The Google file system , 2003 .

[20]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[21]  Kannan Ramchandran,et al.  Exact-repair MDS codes for distributed storage using interference alignment , 2010, 2010 IEEE International Symposium on Information Theory.

[22]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[23]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[24]  Masayuki Murata,et al.  A New Available Bandwidth Measurement Technique for Service Overlay Networks , 2003, MMNS.

[25]  Peter Steenkiste,et al.  Evaluation and characterization of available bandwidth probing techniques , 2003, IEEE J. Sel. Areas Commun..

[26]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[27]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[28]  Kannan Ramchandran,et al.  Exact Regenerating Codes for Distributed Storage , 2009, ArXiv.

[29]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[30]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.