A Delay Scheduling Algorithm Based on History Time in Heterogeneous Environments

The MapReduce framework was designed for data-intensive computing. Many users share the Hadoop cluster becomes popular in many companies recent years. So an efficient scheduling algorithm which can balance the utilization and the parallelism of the Hadoop cluster becomes very important. The Hadoop's scheduler (Fair scheduler, Delay scheduler) for multi-user cluster was designed for homogeneous environment, and works poor in heterogeneous environment. In this paper, we propose a new scheduling algorithm for multi-user Hadoop cluster, taking the history time of the completed tasks and the Delay scheduler's strategy into the algorithm, expecting achieve good performance while guaranteeing fairness in shared heterogeneous environment. Our algorithm is implemented in Hadoop 0.21.1, and the experiment demonstrates the validation of our algorithm.