Deadline-aware MapReduce scheduling with selective speculative execution

MapReduce is the renowned distributed and parallel programming model with an extensive support for large scale computing. However, the performance of MapReduce is currently limited by the default scheduler as it is not suitable for heterogeneous environments and it does not consider certain user constraints such as deadlines. The paper proposes a deadline-aware scheduling algorithm that selectively uses speculative execution when the job approaches its deadline in order to expedite job's execution. The algorithm is implemented on the heterogeneous Hadoop cluster and the evaluation shows significant improvement in the performance. The performance improvement was observed as the number of jobs that miss the deadlines as well as the overall execution time for different workloads was minimized.

[1]  Jordi Torres,et al.  Resource-Aware Adaptive Scheduling for MapReduce Clusters , 2011, Middleware.

[2]  Derong Shen,et al.  SAMES: deadline-constraint scheduling in MapReduce , 2014, Frontiers of Computer Science.

[3]  Hao Yang,et al.  MUS: a novel deadline-constrained scheduling algorithm for Hadoop , 2015, Int. J. Comput. Sci. Eng..

[4]  Cristina L. Abad,et al.  Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters , 2013, SoCC.

[5]  Roy H. Campbell,et al.  Deadline-based workload management for MapReduce environments: Pieces of the performance puzzle , 2012, 2012 IEEE Network Operations and Management Symposium.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Kenli Li,et al.  A MapReduce task scheduling algorithm for deadline constraints , 2013, Cluster Computing.

[8]  Kemafor Anyanwu,et al.  Scheduling Hadoop Jobs to Meet Deadlines , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[9]  Mingfa Zhu,et al.  MIMP: Deadline and Interference Aware Scheduling of Hadoop Virtual Machines , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.