Multiple-Job Optimization in MapReduce for Heterogeneous Workloads

Map Reduce cluster is emerging as a solution of data-intensive scalable computing system. The open source implementation Hadoop has already been adopted for building clusters containing thousands of nodes. Such cloud infrastructure was used to processing many different jobs depending on different hardware resources, such as memory, CPU, Disk I/O and Network I/O, simultaneously. If the schedule policy does not consider the heterogeneity of running jobs’ resource utilization types, resource contention may happen. In this paper, we analyze this multiple job parallelization problems in Map Reduce, and propose the multiple-job optimization (MJO) scheduler. Our scheduler detects job’s resource utilization type on the fly and improves the hardware utilization by parallel different kinds of jobs. We give two scenarios which are “same plan” and “same job” to illustrate the multiple jobs’ submission traces in Map Reduce clusters. Our experiments show that in these scenarios, MJO scheduler could save the make span by about 20%.