A self-adaptive scheduling algorithm for reduce start time

MapReduce is by far one of the most successful realizations of large-scale data-intensive cloud computing platforms. When to start the reduce tasks is one of the key problems to advance the MapReduce performance. The existing implementations may result in a block of reduce tasks. When the output of map tasks become large, the performance of a MapReduce scheduling algorithm will be influenced seriously. Through analysis for the current MapReduce scheduling mechanism, this paper illustrates the reasons of system slot resources waste, which results in the reduce tasks waiting around, and proposes an optimal reduce scheduling policy called SARS (Self Adaptive Reduce Scheduling) for reduce tasks' start times in the Hadoop platform. It can decide the start time point of each reduce task dynamically according to each job context, including the task completion time and the size of map output. Through estimating job completion time, reduce completion time, and system average response time, the experimental results illustrate that, when comparing with other algorithms, the reduce completion time is decreased sharply. It is also proved that the average response time is decreased by 11% to 29%, when the SARS algorithm is applied to the traditional job scheduling algorithms FIFO, FairScheduler, and CapacityScheduler. This paper illustrates the reasons of the system slots waster for reduces tasks waiting around.The model can determine the start time of reduce tasks, dynamically according to job context.As an optimal scheduling algorithm, SARS can decrease the reduce completion time for jobs.

[1]  Stéphane Marchand-Maillet,et al.  MRO-MPI: MapReduce overlapping using MPI and an optimized data exchange policy , 2013, Parallel Comput..

[2]  XiaoYang,et al.  Achieving Accountable MapReduce in cloud computing , 2014 .

[3]  Seyong Lee,et al.  MapReduce with communication overlap (MaRCO) , 2013, J. Parallel Distributed Comput..

[4]  Kenli Li,et al.  A MapReduce-Enabled Scientific Workflow Framework with Optimization Scheduling Algorithm , 2012, 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies.

[5]  Jun Wang,et al.  Supporting HPC Analytics Applications with Access Patterns Using Data Restructuring and Data-Centric Scheduling Techniques in MapReduce , 2013, IEEE Transactions on Parallel and Distributed Systems.

[6]  Roy H. Campbell,et al.  Orchestrating an Ensemble of MapReduce Jobs for Minimizing Their Makespan , 2013, IEEE Transactions on Dependable and Secure Computing.

[7]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[8]  Zhang Xiaohong,et al.  A Scheduling Method Based on Deadlines in MapReduce , 2012 .

[9]  Roy H. Campbell,et al.  Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[10]  Yuan Luo,et al.  Hierarchical MapReduce Programming Model and Scheduling Algorithms , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[11]  Minghong Lin,et al.  Joint optimization of overlapping phases in MapReduce , 2013, Perform. Evaluation.

[12]  Meikang Qiu,et al.  Online optimization for scheduling preemptable tasks on IaaS cloud systems , 2012, J. Parallel Distributed Comput..

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Thomas Sandholm,et al.  Dynamic Proportional Share Scheduling in Hadoop , 2010, JSSPP.

[15]  Ling Liu,et al.  Cura: A Cost-Optimized Model for MapReduce in a Cloud , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[16]  Xiao Qin,et al.  Research on Scheduling Scheme for Hadoop Clusters , 2013, ICCS.

[17]  Albert Y. Zomaya,et al.  Workload Characteristic Oriented Scheduler for MapReduce , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[18]  Kyong Hoon Kim,et al.  Minimizing Cost of Virtual Machines for Deadline-Constrained MapReduce Applications in the Cloud , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[19]  Mingzhu Li,et al.  A Dispatching-Rule-Based Task Scheduling Policy for MapReduce with Multi-type Jobs in Heterogeneous Environments , 2012, 2012 Seventh ChinaGrid Annual Conference.

[20]  Mohammad Hammoud,et al.  Center-of-Gravity Reduce Task Scheduling to Lower MapReduce Network Traffic , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[21]  Yuping Wang,et al.  An Energy and Data Locality Aware Bi-level Multiobjective Task Scheduling Model Based on MapReduce for Cloud Computing , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[22]  Joanna Berlinska,et al.  Scheduling divisible MapReduce computations , 2011, J. Parallel Distributed Comput..

[23]  Lizhe Wang,et al.  Hierarchical genetic-based grid scheduling with energy optimization , 2012, Cluster Computing.

[24]  Magdalena Balazinska,et al.  ParaTimer: a progress indicator for MapReduce DAGs , 2010, SIGMOD Conference.

[25]  Yang Xiao,et al.  Achieving Accountable MapReduce in cloud computing , 2014, Future Gener. Comput. Syst..

[26]  Kenli Li,et al.  A MapReduce task scheduling algorithm for deadline constraints , 2013, Cluster Computing.

[27]  Vasudeva Varma,et al.  Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework , 2012, Future Gener. Comput. Syst..

[28]  Rajiv Ranjan,et al.  G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..

[29]  Albert Y. Zomaya,et al.  A survey on resource allocation in high performance distributed computing systems , 2013, Parallel Comput..

[30]  Jinjun Chen,et al.  A security framework in G-Hadoop for big data computing across distributed Cloud data centres , 2014, J. Comput. Syst. Sci..

[31]  Joanna Berlinska,et al.  Heuristics for multi-round divisible loads scheduling with limited memory , 2010, Parallel Comput..

[32]  Minghong Lin,et al.  Joint optimization of overlapping phases in MapReduce , 2013, PERV.

[33]  Jordi Torres,et al.  Deadline-Based MapReduce Workload Management , 2013, IEEE Transactions on Network and Service Management.