Workload Characteristic Oriented Scheduler for MapReduce

Applications in many areas are increasingly developed and ported using the Map Reduce framework (more specifically, Hadoop) to exploit (data) parallelism. The application scope of Map Reduce has been extended beyond the original design goal which was large-scale data processing. This extension inherently makes a need for scheduler to explicitly take into account characteristics of job for two main goals of efficient resource use and performance improvement. In this paper, we study Map Reduce scheduling strategies to effectively deal with different workload characteristics CPU intensive and I/O intensive. We present the Workload Characteristic Oriented Scheduler (WCO), which strives for co-locating tasks of possibly different Map Reduce jobs with complementing resource usage characteristics. WCO is characterized by its essentially dynamic and adaptive scheduling decisions using information obtained from its characteristic estimator. Workload characteristics of tasks are primarily estimated by sampling with the help of some static task selection strategies, e.g., Java byte code analysis. Results obtained from extensive experiments using 11 benchmarks in a 4-node local cluster and a 51-node Amazon EC2 cluster show 17% performance improvement on average in terms of throughput in the situation of co-existing diverse workloads.

[1]  Jordi Torres,et al.  Performance Management of Accelerated MapReduce Workloads in Heterogeneous Clusters , 2010, 2010 39th International Conference on Parallel Processing.

[2]  Martin Schoeberl,et al.  WCET analysis for a Java processor , 2006, JTRES '06.

[3]  Elvira Albert,et al.  Cost Analysis of Java Bytecode , 2007, ESOP.

[4]  Kimberly Keeton,et al.  Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems , 2011, SIGMETRICS 2011.

[5]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[6]  Geoffrey C. Fox,et al.  MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.

[7]  Elvira Albert,et al.  Closed-Form Upper Bounds in Static Cost Analysis , 2011, Journal of Automated Reasoning.

[8]  Randy H. Katz,et al.  Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud , 2011, HotCloud.

[9]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[10]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[11]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[12]  Thomas Sandholm,et al.  Dynamic Proportional Share Scheduling in Hadoop , 2010, JSSPP.

[13]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[16]  Albert Y. Zomaya,et al.  Profiling Applications for Virtual Machine Placement in Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[17]  L. John,et al.  Modeling program resource demand using inherent program characteristics , 2011, PERV.

[18]  Chao Tian,et al.  A Dynamic MapReduce Scheduler for Heterogeneous Workloads , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.