FlexSlot: Moving Hadoop Into the Cloud with Flexible Slot Management

Load imbalance is a major source of overhead in Hadoop where the uneven distribution of input data among tasks can significantly delays the job completion. Running Hadoop in a private cloud opens up opportunities for mitigating data skew with elastic resource allocation, where stragglers are expedited with more resources, yet introduces problems that often cancel out the performance gain: (1) performance interference from co running jobs may create new stragglers, (2) there exist a semantic gap between Hadoop task management and resource pool-based virtual cluster management preventing efficient resource usage. We present FlexSlot, a user-transparent task slot management scheme that automatically identifies map stragglers and resizes their slots accordingly to accelerate task execution. FlexSlot adaptively changes the number of slots on each virtual node to promote efficient usage of resource pool. Experimental results with representative benchmarks show that FlexSlot effectively reduces job completion time by 46% and achieves better resource utilization.

[1]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[2]  Roy H. Campbell,et al.  Resource Provisioning Framework for MapReduce Jobs with Performance Goals , 2011, Middleware.

[3]  Yonggang Hu,et al.  DynMR: dynamic MapReduce with ReduceTask interleaving and MapTask backfilling , 2014, EuroSys '14.

[4]  Raghu Ramakrishnan,et al.  Sailfish: a framework for large scale data processing , 2012, SoCC '12.

[5]  T. N. Vijaykumar,et al.  Tarazu: optimizing MapReduce on heterogeneous clusters , 2012, ASPLOS XVII.

[6]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[7]  Prashant J. Shenoy,et al.  A platform for scalable one-pass analytics using MapReduce , 2011, SIGMOD '11.

[8]  Peter J. Varman,et al.  Defragmenting the cloud using demand-based resource allocation , 2013, SIGMETRICS '13.

[9]  Antony I. T. Rowstron,et al.  Bridging the tenant-provider gap in cloud services , 2012, SoCC '12.

[10]  Hai Jin,et al.  LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[11]  Kun-Lung Wu,et al.  FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads , 2010, Middleware.

[12]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[13]  Boon Thau Loo,et al.  AutoTune: Optimizing Execution Concurrency and Resource Usage in MapReduce Workflows , 2013, ICAC.

[14]  Y. Charlie Hu,et al.  PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters , 2013, USENIX Annual Technical Conference.

[15]  Xiaobo Zhou,et al.  iShuffle: Improving Hadoop Performance with Shuffle-on-Write , 2013, ICAC 2013.

[16]  Ajay Gulati VMware distributed resource Management : design , Implementation , and lessons learned , 2022 .

[17]  Jennifer L. Wong,et al.  To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach , 2013, ASPLOS '13.

[18]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[19]  M. Balazinska,et al.  An analysis of Hadoop usage in scientific workloads , 2013 .

[20]  Magdalena Balazinska,et al.  Skew-resistant parallel processing of feature-extracting scientific user-defined functions , 2010, SoCC '10.

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  Magdalena Balazinska,et al.  SkewTune: mitigating skew in mapreduce applications , 2012, SIGMOD Conference.

[23]  Seyong Lee,et al.  PUMA: Purdue MapReduce Benchmarks Suite , 2012 .

[24]  Roy H. Campbell,et al.  ARIA: automatic resource inference and allocation for mapreduce environments , 2011, ICAC '11.