A decentralized redundancy generation scheme for codes with locality in distributed storage systems

The increasing data volume in a large number of applications presents a dire need for supporting the reliable data management in distributed storage systems. Existing classical erasure codes, such as the Reed‐Solomon codes and locally reconstruction codes, are widely adopted by many distributed storage systems. However, existing researches mainly focus on proposing new optimized codes, ignoring the optimization of the encoding process with the classical codes, where inefficient encoding process greatly degrades the encoding performance of the distributed storage systems. Thus, how to complete the encoding process in an efficient way has become the challenge for adopting the classical codes. In this paper, we propose a decentralized redundancy generation scheme on the basis of the codes with locality, called D2CP, where a 2‐step framework is proposed to support both the data patterns (replication to encoding and direct encoding) and codes with locality with any parameter set. For improving the insertion throughput, D2CP adopts a data placement technique with consistent hashing to guide the selection of nodes. For reducing the network traffic cost, D2CP adopts a data sending scheduling technique to schedule the transmission of the source nodes and a cooperative parity generation technique to generate the parity data cooperatively. To evaluate the performance of D2CP, we conduct experiments on our RAID distributed storage system under various parameter settings with both 30 physical and 200 virtual servers. Extensive experiments confirm that D2CP can improve the encoding throughput by 20% and 32% and reduce the network traffic cost by 16% and 33% compared with the typical approaches on average for the 2 data patterns respectively.

[1]  Garth A. Gibson,et al.  DiskReduce: RAID for data-intensive scalable computing , 2009, PDSW '09.

[2]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[3]  Ernst W. Biersack,et al.  Hierarchical Codes: How to Make Erasure Codes Attractive for Peer-to-Peer Storage Systems , 2008, 2008 Eighth International Conference on Peer-to-Peer Computing.

[4]  GhemawatSanjay,et al.  The Google file system , 2003 .

[5]  Xiaosong Ma,et al.  Does erasure coding have a role to play in my data center , 2010 .

[6]  Yijie Wang,et al.  Repairing multiple failures adaptively with erasure codes in distributed storage systems , 2016, Concurr. Comput. Pract. Exp..

[7]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[8]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[9]  Sriram Vishwanath,et al.  On locality in distributed storage systems , 2012, 2012 IEEE Information Theory Workshop.

[10]  Yijie Wang,et al.  T-Update: A tree-structured update scheme with top-down transmission in erasure-coded systems , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[11]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[12]  Yijie Wang,et al.  Research and performance evaluation of data replication technology in distributed storage systems , 2006, Comput. Math. Appl..

[13]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[14]  Frédérique E. Oggier,et al.  In-network redundancy generation for opportunistic speedup of data backup , 2013, Future Gener. Comput. Syst..

[15]  I. Reed,et al.  Polynomial Codes Over Certain Finite Fields , 1960 .

[16]  Cheng Huang,et al.  On the Locality of Codeword Symbols , 2011, IEEE Transactions on Information Theory.

[17]  Frédérique Oggier,et al.  Self-repairing homomorphic codes for distributed storage systems , 2010, 2011 Proceedings IEEE INFOCOM.

[18]  Yijie Wang,et al.  A General Scalable and Elastic Content-Based Publish/Subscribe Service , 2015, IEEE Transactions on Parallel and Distributed Systems.

[19]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[20]  Minghua Chen,et al.  Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems , 2007, Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007).

[21]  Xiaoling Li,et al.  A survey of queries over uncertain data , 2013, Knowledge and Information Systems.

[22]  Frédérique E. Oggier,et al.  Decentralized Erasure Coding for Efficient Data Archival in Distributed Storage Systems , 2013, ICDCN.

[23]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[24]  Frédérique E. Oggier,et al.  RapidRAID: Pipelined erasure codes for fast data archival in distributed storage systems , 2013, 2013 Proceedings IEEE INFOCOM.