Dynamic Erasure Coding Policy Allocation (DECPA) in Hadoop 3.0

Erasure Code (EC) is being tipped to be the next best alternative to Replication for providing redundancy for Cloud Storage. Already major players like Microsoft and Facebook are having initial implementations using Erasure Code. Hadoop 0.20 was the first version that supported Erasure Code (aka HDFS-RAID), but EC was not included in later versions, only to resurface in Hadoop 3.0 (HDFS-EC). Hadoop 3.0.0 supports three default Erasure Code polices which are RS(3,2), RS(6,3) and RS(10,4). To have greater flexibility, in this work we opt for the implementation of new Erasure Code policies [RS(4,3), RS(5,3), RS(7,3), RS(7,4), RS(8,4), RS(9,4)] and the development of a Dynamic Erasure Coding Policy Allocation, based on minimum overhead produced, in order to maximize storage capacity. Three types of dynamic allocation have been proposed and implemented. A performance evaluation of the new polices was conducted in order to find the optional one for a NAS/SAN architecture and the effectiveness of the three implemented Dynamic Allocation of EC policy is provided.

[1]  Baochun Li,et al.  Zebra: Demand-aware erasure coding for distributed storage systems , 2016, 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS).

[2]  Kyumars Sheykh Esmaili,et al.  Efficient updates in cross-object erasure-coded storage systems , 2013, 2013 IEEE International Conference on Big Data.

[3]  Aatish Chiniah,et al.  Erasure-Coded Network Backup System (ECNBS) , 2017 .

[4]  Dan Feng,et al.  CDRM: A Cost-Effective Dynamic Replication Management Scheme for Cloud Storage Cluster , 2010, 2010 IEEE International Conference on Cluster Computing.

[5]  Aatish Chiniah,et al.  HIVE-EC: Erasure Code Functionality in HIVE Through Archiving , 2018 .

[6]  Ning Zhang,et al.  ERMS: An Elastic Replication Management System for HDFS , 2012, 2012 IEEE International Conference on Cluster Computing Workshops.

[7]  Dong Xu,et al.  A Dynamic Erasure Code Based on Block Code , 2019, EWSN.

[8]  Rohit G. Masur,et al.  Preliminary performance analysis of Hadoop 3.0.0-alpha3 , 2017, 2017 New York Scientific Data Summit (NYSDS).

[9]  Cheng Huang,et al.  Giza: Erasure Coding Objects across Global Data Centers , 2017, USENIX Annual Technical Conference.

[10]  Sun-Yuan Hsieh,et al.  A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments , 2014, Big Data Res..

[11]  Xin Wang,et al.  Efficient Memory Caching for Erasure Coding Based Key-Value Storage Systems , 2018, Big Data.