Is it Time to Revisit Erasure Coding in Data-Intensive Clusters?

Data-intensive clusters are heavily relying on distributed storage systems to accommodate the unprecedented growth of data. Hadoop distributed file system (HDFS) is the primary storage for data analytic frameworks such as Spark and Hadoop. Traditionally, HDFS operates under replication to ensure data availability and to allow locality-aware task execution of data-intensive applications. Recently, erasure coding (EC) is emerging as an alternative method to replication in storage systems due to the continuous reduction in its computation overhead. In this work, we conduct an extensive experimental study to understand the performance of data-intensive applications under replication and EC. We use representative benchmarks on the Grid'5000 testbed to evaluate how analytic workloads, data persistency, failures, the back-end storage devices, and the network configuration impact their performances. Our study sheds the light not only on the potential benefits of erasure coding in data-intensive clusters but also on the aspects that may help to realize it effectively.

[1]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[2]  Gabriel Antoniu,et al.  Chronos: Failure-aware scheduling in shared Hadoop clusters , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[3]  Jun Li,et al.  Parallelism-Aware Locally Repairable Code for Distributed Storage Systems , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[4]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[5]  Onur Mutlu,et al.  Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[6]  Tie Qiu,et al.  Survey on fog computing: architecture, key technologies, applications and open issues , 2017, J. Netw. Comput. Appl..

[7]  Ali Raza Butt,et al.  hatS: A Heterogeneity-Aware Tiered Storage for Hadoop , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[8]  Jaehwan Lee,et al.  Introducing SSDs to the Hadoop MapReduce Framework , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[9]  Dhabaleswar K. Panda,et al.  High-Performance Design of Hadoop RPC with RDMA over InfiniBand , 2013, 2013 42nd International Conference on Parallel Processing.

[10]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[11]  Xiaosong Ma,et al.  Does erasure coding have a role to play in my data center , 2010 .

[12]  Baochun Li,et al.  On Data Parallelism of Erasure Coding in Distributed Storage Systems , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[13]  Scott Shenker,et al.  Disk-Locality in Datacenter Computing Considered Irrelevant , 2011, HotOS.

[14]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[15]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[16]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[17]  Daniel Hagimont,et al.  Welcome to zombieland: practical and energy-efficient memory disaggregation in a datacenter , 2018, EuroSys.

[18]  Wei Wang,et al.  SP-Cache: Load-Balanced, Redundancy-Free Cluster Caching with Selective Partition , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Shadi Ibrahim,et al.  On the Importance of Container Image Placement for Service Provisioning in the Edge , 2019, 2019 28th International Conference on Computer Communication and Networks (ICCCN).

[20]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[21]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[22]  Krste Asanovic,et al.  FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers , 2014 .

[23]  Kannan Ramchandran,et al.  A "hitchhiker's" guide to fast and efficient data reconstruction in erasure-coded data centers , 2015, SIGCOMM 2015.

[24]  Hai Jin,et al.  The MapReduce Programming Model and Implementations , 2011, CloudCom 2011.

[25]  Minlan Yu,et al.  Wide-area analytics with multiple resources , 2018, EuroSys.

[26]  Raja Lavanya,et al.  Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.

[27]  Ashish Gupta,et al.  The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..

[28]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[29]  Chen Wang,et al.  Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics , 2015, Proc. VLDB Endow..

[30]  T. S. Eugene Ng,et al.  Understanding the effects and implications of compute node related failures in hadoop , 2012, HPDC '12.

[31]  GhemawatSanjay,et al.  The Google file system , 2003 .

[32]  Heng Zhang,et al.  Efficient and Available In-Memory KV-Store with Hybrid Erasure Coding and Replication , 2016, FAST.

[33]  Patrick P. C. Lee,et al.  Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[34]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[35]  Kannan Ramchandran,et al.  EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding , 2016, OSDI.

[36]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[37]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[38]  H. Howie Huang,et al.  Falcon: Scaling IO Performance in Multi-SSD Volumes , 2017, USENIX Annual Technical Conference.

[39]  Gala Yadgar,et al.  Avoiding the Streetlight Effect: I/O Workload Analysis with SSDs in Mind , 2016, HotStorage.

[40]  Weikuan Yu,et al.  Hadoop acceleration through network levitated merge , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[41]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[42]  Garth A. Gibson,et al.  DiskReduce: RAID for data-intensive scalable computing , 2009, PDSW '09.

[43]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[44]  Hitesh Ballani,et al.  Towards predictable datacenter networks , 2011, SIGCOMM 2011.

[45]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[46]  Dan Alistarh,et al.  A High-Radix, Low-Latency Optical Switch for Data Centers , 2015, Comput. Commun. Rev..

[47]  Yanpei Chen,et al.  Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads , 2012, Proc. VLDB Endow..

[48]  Hai Jin,et al.  Maestro: Replica-Aware Map Scheduling for MapReduce , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).