Topology-Aware Data Placement Strategy for Fault-Tolerant Storage Systems

In distributed storage systems, fault-tolerant methods such as replication or erasure coding are adopted to guarantee data reliability. These methods ensure that data could be recovered via a redundancy mechanism when any storage node suffers a failure. However, this redundancy mechanism often incurs nontrivial bandwidth overhead to transmit quantities of replicas and blocks. Prior methods focus on how to reduce the network cost through careful scheduling. In this article, we aim to improve the transmission efficiency from an orthogonal dimension, i.e., optimizing the storage locations according to the characteristics of data center networks. We focus on server-centric data centers (such as BCube), where any pair of nodes are interconnected with multiple redundant paths. Thus, transmissions for replicas or blocks can be significantly speeded up via utilizing the redundant paths concurrently. Inspired by this insight, we design the node-disjoint storage strategy and the nested node-disjoint storage strategy for the multireplica storage system and the erasure-coded storage system, respectively. Evaluations indicate that our methods can save 46.6%–62.1% of the transmission time in the multireplica storage system and 71.5%–80.8% of the transmission time in the erasure-coded storage system, compared with conventional methods adopted in current storage systems.

[1]  Saurabh Bagchi,et al.  Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage , 2016, EuroSys.

[2]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[3]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.

[4]  Jing Zhang,et al.  Aggrecode: Constructing route intersection for data reconstruction in erasure coded storage , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[5]  Kannan Ramchandran,et al.  A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[8]  Garth A. Gibson,et al.  DiskReduce: RAID for data-intensive scalable computing , 2009, PDSW '09.

[9]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[10]  Deke Guo,et al.  Topology-Aware Efficient Storage Scheme for Fault-Tolerant Storage Systems in Data Centers , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[11]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[12]  Yan Zhang,et al.  On Architecture Design, Congestion Notification, TCP Incast and Power Consumption in Data Centers , 2013, IEEE Communications Surveys & Tutorials.

[13]  Å. Björck,et al.  Solution of Vandermonde Systems of Equations , 1970 .

[14]  Baochun Li,et al.  Beehive: Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[15]  Kannan Ramchandran,et al.  Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.

[16]  Junxu Xia,et al.  An In-Network Aggregation Scheme for Erasure Coding Storage Systems in Data Centers , 2018, 2018 Sixth International Conference on Advanced Cloud and Big Data (CBD).

[17]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[18]  Lakshmi Ganesh,et al.  Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.

[19]  Jie Wu,et al.  DCube: A family of network structures for containerized data centers using dual-port servers , 2014, Comput. Commun..

[20]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[21]  Xin Wang,et al.  Tree-structured Data Regeneration in Distributed Storage Systems with Regenerating Codes , 2010, 2010 Proceedings IEEE INFOCOM.