Resource Scheduling and Data Locality for Virtualized Hadoop on IaaS Cloud Platform

With cloud computing technology becoming more mature, it is urgent to combine big data processing tool Hadoop with IaaS cloud platform. In this paper, we firstly propose a new Dynamic Hadoop Cluster on IaaS (DHCI) architecture, which includes four key modules: monitoring module, scheduling module, virtual machine management module and virtual machine migration module. The load of both physical hosts and virtual machines are collected by the monitoring module, and can be used for designing resource scheduling and data locality solutions. Secondly, we present a load feedback based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtualized cluster can be achieved by fluctuating the amount of virtual machines (VMs). Thirdly, we reuse the method of VM migration and propose a dynamic migration based data locality scheme. We migrate computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on Openstack. Massive experimental results demonstrate the effectiveness of our solutions that contribute to balance workload and performance improvement, even under heavy-loaded cloud system conditions.

[1]  Nazrul M. Ahmad,et al.  Hadoop in OpenStack: Data-location-aware cluster provisioning , 2014, 2014 4th World Congress on Information and Communication Technologies (WICT 2014).

[2]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[3]  Palden Lama,et al.  AROMA: automated resource allocation and configuration of mapreduce environment in the cloud , 2012, ICAC '12.

[4]  Radu Sion,et al.  Enhancement of Xen's scheduler for MapReduce workloads , 2011, HPDC '11.

[5]  Rajiv Ranjan,et al.  G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..

[6]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[7]  Qi Zhang,et al.  Residency Aware Inter-VM Communication in Virtualized Cloud: Performance Measurement and Analysis , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[8]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[9]  Antonio Corradi,et al.  Elastic provisioning of virtual Hadoop clusters in OpenStack-based Clouds , 2015, 2015 IEEE International Conference on Communication Workshop (ICCW).

[10]  Cheng-Zhong Xu,et al.  Interference and locality-aware task scheduling for MapReduce applications in virtual clusters , 2013, HPDC.

[11]  Xian-He Sun,et al.  ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[12]  Mahmut T. Kandemir,et al.  MROrchestrator: A Fine-Grained Resource Orchestration Framework for MapReduce Clusters , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.