Job Scheduling Optimization for Multi-user MapReduce Clusters

A shared MapReduce cluster is beneficial to build data warehouse which can be used by multiple users. FAIR scheduler gives each user the illusion of owning a private cluster. Moreover, it can dynamic redistribute capacity unused by some users to other users. However, when reassigning the slots, FAIR picks the most recently launched tasks to kill without considering the job character and data locality, which increases the network traffic while rescheduling the killed Map/Reduce tasks. The paper, based on FAIR scheduling, proposes an improved FAIR scheduling algorithm, which take into account the job character and data locality while killing tasks to make slots for new users. Performance evaluation results demonstrate that the improved FAIR decreases the data movement, speeds the execution of jobs, consequently improving the system performance.

[1]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[2]  Fang Dong,et al.  BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[3]  Chao Tian,et al.  A Dynamic MapReduce Scheduler for Heterogeneous Workloads , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[4]  Benjamin Rose,et al.  Supporting MapReduce on large-scale asymmetric multi-core clusters , 2009, OPSR.

[5]  Jin-Soo Kim,et al.  HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[6]  Hai Jin,et al.  CLOUDLET: towards mapreduce implementation on virtual machines , 2009, HPDC '09.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.