Evaluating task scheduling in hadoop-based cloud systems

Nowadays, private clouds are widely used for resource sharing. Hadoop-based clusters are the most popular implementations for private clouds. However, because workload traces are not publicly available, few previous work compares and evaluates different cloud solutions with publicly available benchmarks. In this paper, we use a recently-released Cloud benchmarks suite - CloudRank-D to quantitatively evaluate five different Hadoop task schedulers, including FIFO, capacity, naïve fair sharing, fair sharing with delay, and HOD (Hadoop On Demand) scheduling. Our experiments show that with an appropriate scheduler, the throughput of a private cloud can be improved by 20%.