Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters

We have witnessed an increasing adoption of erasure coding in modern clustered storage systems to reduce the storage overhead of traditional 3-way replication. However, it remains an open issue of how to customize the data analytics paradigm for erasure-coded storage, especially when the storage system operates in failure mode. We propose degraded-first scheduling, a new MapReduce scheduling scheme that improves MapReduce performance in erasure-coded clustered storage systems in failure mode. Its main idea is to launch degraded tasks earlier so as to leverage the unused network resources. We conduct mathematical analysis and discrete event simulation to show the performance gain of degraded-first scheduling over Hadoop's default locality-first scheduling. We further implement degraded-first scheduling on Hadoop and conduct test bed experiments in a 13-node cluster. We show that degraded-first scheduling reduces the MapReduce runtime of locality-first scheduling.

[1]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[2]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[3]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[4]  Lluis Pamies-Juarez,et al.  The CORE Storage Primitive: Cross-Object Redundancy for Efficient Data Repair & Access in Erasure Coded Storage , 2013, ArXiv.

[5]  Xiaoqiao Meng,et al.  Delay tails in MapReduce scheduling , 2012, SIGMETRICS '12.

[6]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[7]  Srikanth Kandula,et al.  Leveraging endpoint flexibility in data-intensive clusters , 2013, SIGCOMM.

[8]  Ethan L. Miller,et al.  Screaming fast Galois field arithmetic using intel SIMD instructions , 2013, FAST.

[9]  Dimitris S. Papailiopoulos,et al.  Simple regenerating codes: Network coding for cloud storage , 2011, 2012 Proceedings IEEE INFOCOM.

[10]  Gregory R. Ganger,et al.  Ursa minor: versatile cluster-based storage , 2005, FAST'05.

[11]  Y. Charlie Hu,et al.  PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters , 2013, USENIX Annual Technical Conference.

[12]  Garth A. Gibson,et al.  DiskReduce: RAID for data-intensive scalable computing , 2009, PDSW '09.

[13]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[14]  Kun-Lung Wu,et al.  FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads , 2010, Middleware.

[15]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[16]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[17]  Catherine D. Schuman,et al.  A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage , 2009, FAST.

[18]  Quan Zhang,et al.  Fast and Cautious Evolution of Cloud Storage , 2010, HotStorage.

[19]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[20]  Xiaosong Ma,et al.  Does erasure coding have a role to play in my data center , 2010 .

[21]  Garth A. Gibson,et al.  DiskReduce : Replication as a Prelude to Erasure Coding in Data-Intensive Scalable Computing , 2011 .

[22]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Patrick P. C. Lee,et al.  Parity logging with reserved space: towards efficient updates and recovery in erasure-coded clustered storage , 2014, FAST.

[25]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.

[26]  Marek Karpinski,et al.  An XOR-based erasure-resilient coding scheme , 1995 .

[27]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[28]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[29]  Daniel P. Siewiorek,et al.  Architectures and algorithms for on-line failure recovery in redundant disk arrays , 1994, Distributed and Parallel Databases.

[30]  Jian Lin,et al.  CORE: Augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[31]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[32]  GhemawatSanjay,et al.  The Google file system , 2003 .