Improving Performance of Heterogeneous MapReduce Clusters with Adaptive Task Tuning

Datacenter-scale clusters are evolving toward heterogeneous hardware architectures due to continuous server replacement. Meanwhile, datacenters are commonly shared by many users for quite different uses. It often exhibits significant performance heterogeneity due to multi-tenant interferences. The deployment of MapReduce on such heterogeneous clusters presents significant challenges in achieving good application performance compared to in-house dedicated clusters. As most MapReduce implementations are originally designed for homogeneous environments, heterogeneity can cause significant performance deterioration in job execution despite existing optimizations on task scheduling and load balancing. In this paper, we observe that the homogeneous configuration of tasks on heterogeneous nodes can be an important source of load imbalance and thus cause poor performance. Tasks should be customized with different configurations to match the capabilities of heterogeneous nodes. To this end, we propose a self-adaptive task tuning approach, Ant, that automatically searches the optimal configurations for individual tasks running on different nodes. In a heterogeneous cluster, Ant first divides nodes into a number of homogeneous subclusters based on their hardware configurations. It then treats each subcluster as a homogeneous cluster and independently applies the self-tuning algorithm to them. Ant finally configures tasks with randomly selected configurations and gradually improves tasks configurations by reproducing the configurations from best performing tasks and discarding poor performing configurations. To accelerate task tuning and avoid trapping in local optimum, Ant uses genetic algorithm during adaptive task configuration. Experimental results on a heterogeneous physical cluster with varying hardware capabilities show that Ant improves the average job completion time by 31, 20, and 14 percent compared to stock Hadoop (Stock), customized Hadoop with industry recommendations (Heuristic), and a profiling-based configuration approach (Starfish), respectively. Furthermore, we extend Ant to virtual MapReduce clusters in a multi-tenant private cloud. Specifically, Ant characterizes a virtual node based on two measured performance statistics: I/O rate and CPU steal time. It uses k-means clustering algorithm to classify virtual nodes into configuration groups based on the measured dynamic interference. Experimental results on virtual clusters with varying interferences show that Ant improves the average job completion time by 20, 15, and 11 percent compared to Stock, Heuristic and Starfish, respectively.

[1]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[2]  Vinay Setty,et al.  Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) , 2010, Proc. VLDB Endow..

[3]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[4]  Changjun Jiang,et al.  Resource and Deadline-Aware Job Scheduling in Dynamic Hadoop Clusters , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[5]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[6]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[7]  Antony I. T. Rowstron,et al.  Scale-up vs scale-out for Hadoop: time to rethink? , 2013, SoCC.

[8]  Himabindu Pucha,et al.  Towards Optimizing Hadoop Provisioning in the Cloud , 2009, HotCloud.

[9]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[10]  Hai Jin,et al.  Mammoth: Gearing Hadoop Towards Memory-Intensive MapReduce Applications , 2015, IEEE Transactions on Parallel and Distributed Systems.

[11]  Cong Xu,et al.  CooMR: Cross-task coordination for efficient data management in MapReduce programs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[12]  Ling Liu,et al.  Cost-Effective Resource Provisioning for MapReduce in a Cloud , 2015, IEEE Transactions on Parallel and Distributed Systems.

[13]  Yong Cheng,et al.  Minimum Standard Deviation Difference-Based Thresholding , 2010, 2010 International Conference on Measuring Technology and Mechatronics Automation.

[14]  Chita R. Das,et al.  HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[15]  H. Howie Huang,et al.  TRACON: Interference-Aware Schedulingfor Data-Intensive Applicationsin Virtualized Environments , 2011, IEEE Transactions on Parallel and Distributed Systems.

[16]  Jordi Torres,et al.  Autonomic Placement of Mixed Batch and Transactional Workloads , 2012, IEEE Transactions on Parallel and Distributed Systems.

[17]  Li Zhang,et al.  MRONLINE: MapReduce online performance tuning , 2014, HPDC '14.

[18]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[19]  Xiaobo Zhou,et al.  iShuffle: Improving Hadoop Performance with Shuffle-on-Write , 2017, IEEE Transactions on Parallel and Distributed Systems.

[20]  Cristina L. Abad,et al.  Natjam: Eviction Policies For Supporting Priorities and Deadlines in Mapreduce Clusters , 2013 .

[21]  Changjun Jiang,et al.  Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable Datacenters , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[22]  Changjun Jiang,et al.  Towards Energy Efficiency in Heterogeneous Hadoop Clusters by Adaptive Task Assignment , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[23]  Cristina L. Abad,et al.  Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters , 2013, SoCC.

[24]  T. N. Vijaykumar,et al.  Tarazu: optimizing MapReduce on heterogeneous clusters , 2012, ASPLOS XVII.

[25]  Changjun Jiang,et al.  FlexSlot: Moving Hadoop Into the Cloud with Flexible Slot Management , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[26]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[27]  Xiaobo Zhou,et al.  Improving MapReduce performance in heterogeneous environments with adaptive task tuning , 2014, Middleware.

[28]  Jordi Torres,et al.  Enabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement , 2008, Middleware.

[29]  Jorge-Arnulfo Quiané-Ruiz,et al.  Trojan data layouts: right shoes for a running elephant , 2011, SoCC.

[30]  Kun-Lung Wu,et al.  FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads , 2010, Middleware.

[31]  Changjun Jiang,et al.  Elastic Power-Aware Resource Provisioning of Heterogeneous Workloads in Self-Sustainable Datacenters , 2016, IEEE Transactions on Computers.

[32]  Palden Lama,et al.  AROMA: automated resource allocation and configuration of mapreduce environment in the cloud , 2012, ICAC '12.