Preemptive Hadoop Jobs Scheduling under a Deadline

MapReduce has become the dominant programming model in a cloud-based data processing environment, such as Hadoop. First In First Out (FIFO) is the default job scheduling policy of Hadoop, but it cannot guarantee that the job will be completed by a specific deadline. Research has been focused on developing deadline-based MapReduce schedulers by using the non-preemptive scheduling approach. However, compared with the non-preemptive scheduling approach, the preemptive scheduling approach has some advantages, such as the total completion time and slot utilization. In this paper, we first formulated the preemptive scheduling problem under deadline constraint, and then we proposed preemptive scheduling algorithms. To our knowledge we implemented the first real preemptive job scheduler to meet deadlines on Hadoop. The experimental results indicate that the preemptive scheduling approach is promising, which is more efficient than the non-preemptive one for executing jobs under a certain deadline.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Kwang Mong Sim,et al.  A comparative review of job scheduling for MapReduce , 2011, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems.

[3]  Insup Lee,et al.  Real-Time MapReduce Scheduling , 2010 .

[4]  Kemafor Anyanwu,et al.  Scheduling Hadoop Jobs to Meet Deadlines , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[5]  Thomas Sandholm,et al.  Dynamic Proportional Share Scheduling in Hadoop , 2010, JSSPP.

[6]  Hai Zhuge,et al.  Semantic linking through spaces for cyber-physical-socio intelligence: A methodology , 2011, Artif. Intell..

[7]  Murali S. Kodialam,et al.  Scheduling in mapreduce-like systems for fast completion time , 2011, 2011 Proceedings IEEE INFOCOM.

[8]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[9]  Hai Zhuge,et al.  Interactive semantics , 2010, Artif. Intell..

[10]  Joanna Berlinska,et al.  Scheduling divisible MapReduce computations , 2011, J. Parallel Distributed Comput..

[11]  Malgorzata Steinder,et al.  Performance-driven task co-scheduling for MapReduce environments , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[12]  Matei Zaharia,et al.  Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[13]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.