TDWS: A Job Scheduling Algorithm Based on MapReduce

As organizations start to use data intensive cluster computing systems like Hadoop MapReduce to handle large-scale data, scheduling of jobs become very important in order to achieve efficiency. In the default implementations of Hadoop MapReduce, jobs are scheduled in FIFO order. It easily causes the starvation of small jobs in the event of resources being utilized by large jobs, while Fair Scheduler is inefficient when handling large jobs and it leads to sticky slots problem. In this paper, we proposed a new job scheduling algorithm TDWS. The scheduling algorithm takes account characters of different applications to meet their different needs. In addition, it is also highly robust to heterogeneity and easy to achieve optimal data locality. The experiments demonstrate the feasibility and efficiency of our solution.

[1]  G. Sudha,et al.  Design and Implementation of a Two Level Scheduler for HADOOP Data Grids , 2010 .

[2]  Thomas Sandholm,et al.  Dynamic Proportional Share Scheduling in Hadoop , 2010, JSSPP.

[3]  Matei Zaharia,et al.  Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[4]  Chao Tian,et al.  A Dynamic MapReduce Scheduler for Heterogeneous Workloads , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[5]  Kemafor Anyanwu,et al.  Scheduling Hadoop Jobs to Meet Deadlines , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[6]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[7]  Vasudeva Varma,et al.  Using Pattern Classification for Task Assignment in MapReduce , 2009 .

[8]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[9]  George Kollios,et al.  MRShare , 2010, Proc. VLDB Endow..

[10]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[11]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[12]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Karsten Schwan,et al.  Providing platform heterogeneity-awareness for data center power management , 2008, Cluster Computing.