Improving MapReduce performance in heterogeneous environments with adaptive task tuning

The deployment of MapReduce in datacenters and clouds present several challenges in achieving good job performance. Compared to in-house dedicated clusters, datacenters and clouds often exhibit significant hardware and performance heterogeneity due to continuous server replacement and multi-tenant interferences. As most Mapreduce implementations assume homogeneous clusters, heterogeneity can cause significant load imbalance in task execution, leading to poor performance and low cluster utilizations. Despite existing optimizations on task scheduling and load balancing, MapReduce still performs poorly on heterogeneous clusters. In this paper, we find that the homogeneous configuration of tasks on heterogeneous nodes can be an important source of load imbalance and thus cause poor performance. Tasks should be customized with different settings to match the capabilities of heterogeneous nodes. To this end, we propose an adaptive task tuning approach, Ant, that automatically finds the optimal settings for individual tasks running on different nodes. Ant works best for large jobs with multiple rounds of map task execution. It first configures tasks with randomly selected configurations and gradually improves tasks settings by reproducing the settings from best performing tasks and discarding poor performing configurations. To accelerate task tuning and avoid trapping in local optimum, Ant uses genetic functions during task configuration. Experimental results on a heterogeneous cluster and a virtual cluster with varying hardware capabilities show that Ant improves the average job completion time by 23%, 11%, and 16% compared to stock Hadoop, customized Hadoop with industry recommendations, and a profiling-based configuration approach, respectively.

[1]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[2]  Xiaobo Zhou,et al.  iShuffle: Improving Hadoop Performance with Shuffle-on-Write , 2013, ICAC 2013.

[3]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[4]  Xiaobo Zhou,et al.  User-Centric Heterogeneity-Aware MapReduce Job Provisioning in the Public Cloud , 2014, ICAC.

[5]  Palden Lama,et al.  AROMA: automated resource allocation and configuration of mapreduce environment in the cloud , 2012, ICAC '12.

[6]  Vinay Setty,et al.  Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) , 2010, Proc. VLDB Endow..

[7]  Tom White,et al.  Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale (4. ed., revised & updated) , 2012 .

[8]  Dick H. J. Epema,et al.  Towards Machine Learning-Based Auto-tuning of MapReduce , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[9]  Yong Cheng,et al.  Minimum Standard Deviation Difference-Based Thresholding , 2010, 2010 International Conference on Measuring Technology and Mechatronics Automation.

[10]  Chita R. Das,et al.  HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[11]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[12]  Roy H. Campbell,et al.  Resource Provisioning Framework for MapReduce Jobs with Performance Goals , 2011, Middleware.

[13]  Himabindu Pucha,et al.  Towards Optimizing Hadoop Provisioning in the Cloud , 2009, HotCloud.

[14]  Jordi Torres,et al.  Enabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement , 2008, Middleware.

[15]  Jorge-Arnulfo Quiané-Ruiz,et al.  Trojan data layouts: right shoes for a running elephant , 2011, SoCC.

[16]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[17]  Seyong Lee,et al.  PUMA: Purdue MapReduce Benchmarks Suite , 2012 .

[18]  Kun-Lung Wu,et al.  FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads , 2010, Middleware.

[19]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[20]  Cristina L. Abad,et al.  Natjam: Eviction Policies For Supporting Priorities and Deadlines in Mapreduce Clusters , 2013 .

[21]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[22]  Vanish Talwar,et al.  VScope: Middleware for Troubleshooting Time-Sensitive Data Center Applications , 2012, Middleware.

[23]  Cristina L. Abad,et al.  Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters , 2013, SoCC.

[24]  T. N. Vijaykumar,et al.  Tarazu: optimizing MapReduce on heterogeneous clusters , 2012, ASPLOS XVII.

[25]  Changjun Jiang,et al.  FlexSlot: Moving Hadoop Into the Cloud with Flexible Slot Management , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[26]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[27]  Antony I. T. Rowstron,et al.  Scale-up vs scale-out for Hadoop: time to rethink? , 2013, SoCC.

[28]  H. Howie Huang,et al.  TRACON: Interference-Aware Schedulingfor Data-Intensive Applicationsin Virtualized Environments , 2011, IEEE Transactions on Parallel and Distributed Systems.

[29]  Cong Xu,et al.  CooMR: Cross-task coordination for efficient data management in MapReduce programs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[30]  Raghu Ramakrishnan,et al.  Sailfish: a framework for large scale data processing , 2012, SoCC '12.