Energy- and locality-efficient multi-job scheduling based on MapReduce for heterogeneous datacenter

Job scheduling of MapReduce is a research hot spot, especially on the heterogeneous datacenter. Huge energy consumption and operating costs are key challenges. Most of the previous work only considers the scheduling optimization of a single job. In this paper, we take multiple jobs of MapReduce as research objects and focus on the goal of “jointly optimizing the scheduling time, job costs and energy consumption.” For that, an energy- and locality-efficient MapReduce multi-job scheduling algorithm is developed for the heterogeneous datacenter. Firstly, we use rack as the basic unit of resource in job scheduling to reduce data communication between jobs and to facilitate energy savings. Secondly, according to the capacity of heterogeneous rack, we design a multi-job pre-mapping method to optimize the execution order of jobs and jointly optimize the scheduling time, job costs and energy consumption. Based this pre-mapping method, we can assign one job to the virtual machine on the same rack, so as to minimize the amount of online rack. This centralized mapping strategy is very helpful to save energy and reduce data transmission of jobs. Thirdly, the map and reduce tasks of a job will be divided into multiple task groups for parallel execution, thereby further reducing data communication and energy consumption. Finally, a lot of experimental results prove the advantages of our algorithm.

[1]  Weisong Shi,et al.  Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications , 2015, IEEE Transactions on Parallel and Distributed Systems.

[2]  Shikharesh Majumdar,et al.  Resource management for deadline constrained MapReduce jobs for minimising energy consumption , 2018 .

[3]  Yanpei Chen,et al.  Energy efficiency for large-scale MapReduce workloads with significant interactive analysis , 2012, EuroSys '12.

[4]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[5]  Deying Li,et al.  Minimizing makespan and total completion time in MapReduce-like systems , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[6]  Athanasios V. Vasilakos,et al.  An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications , 2014, IEEE Transactions on Network and Service Management.

[7]  Luciana Arantes,et al.  MRA++: Scheduling and data placement on MapReduce for heterogeneous environments , 2015, Future Gener. Comput. Syst..

[8]  Vasudeva Varma,et al.  Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework , 2012, Future Gener. Comput. Syst..

[9]  Lei Chen,et al.  Fast community detection based on distance dynamics , 2017 .

[10]  Kenli Li,et al.  A self-adaptive scheduling algorithm for reduce start time , 2015, Future Gener. Comput. Syst..

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Muhammad Imran,et al.  Managing big RDF data in clouds: Challenges, opportunities, and solutions , 2018 .

[13]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[14]  Minghong Lin,et al.  Joint optimization of overlapping phases in MapReduce , 2013, PERV.

[15]  Xu An Wang,et al.  Locality-Aware and Energy-Aware Job Pre-Assignment for Mapreduce , 2016, 2016 International Conference on Intelligent Networking and Collaborative Systems (INCoS).

[16]  Evripidis Bampis,et al.  Energy Efficient Scheduling of MapReduce Jobs , 2014, Euro-Par.

[17]  Siti Mariyam Shamsuddin,et al.  MapReduce a Comprehensive Review , 2018, 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE).

[18]  Shikharesh Majumdar,et al.  Resource management for deadline constrained MapReduce jobs for minimising energy consumption , 2018, Int. J. Big Data Intell..

[19]  Jie Yang,et al.  Energy-Aware Task Scheduling of MapReduce Cluster , 2015, 2015 International Conference on Service Science (ICSS).

[20]  Ling Liu,et al.  Purlieus: Locality-aware resource allocation for MapReduce in a cloud , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[21]  Ling Liu,et al.  Cost-Effective Resource Provisioning for MapReduce in a Cloud , 2015, IEEE Transactions on Parallel and Distributed Systems.

[22]  Ramakrishnan Ramanathan,et al.  Towards optimal resource provisioning for Hadoop-MapReduce jobs using scale-out strategy and its performance analysis in private cloud environment , 2018, Cluster Computing.

[23]  Chenyu Wang,et al.  Cross-Phase Optimization in MapReduce , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[24]  Xian-He Sun,et al.  ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[25]  Nor Badrul Anuar,et al.  MapReduce scheduling algorithms: a review , 2018, The Journal of Supercomputing.

[26]  Jenn-Wei Lin,et al.  Joint deadline-constrained and influence-aware design for allocating MapReduce jobs in cloud computing systems , 2019, Cluster Computing.