FRESH: Fair and Efficient Slot Configuration and Scheduling for Hadoop Clusters

Hadoop is an emerging framework for parallel big data processing. While becoming popular, Hadoop is too complex for regular users to fully understand all the system parameters and tune them appropriately. Especially when processing a batch of jobs, default Hadoop setting may cause inefficient resource utilization and unnecessarily prolong the execution time. This paper considers an extremely important setting of slot configuration which by default is fixed and static. We proposed an enhanced Hadoop system called FRESH which can derive the best slot setting, dynamically configure slots, and appropriately assign tasks to the available slots. The experimental results show that when serving a batch of MapReduce jobs, FRESH significantly improves the makespan as well as the fairness among jobs.

[1]  Yi Yao,et al.  Using a Tunable Knob for Reducing Makespan of MapReduce Jobs in a Hadoop Cluster , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  S. M. Johnson,et al.  Optimal two- and three-stage production schedules with setup times included , 1954 .

[4]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[5]  Jordi Torres,et al.  Resource-Aware Adaptive Scheduling for MapReduce Clusters , 2011, Middleware.

[6]  Yi Yao,et al.  Scheduling heterogeneous MapReduce jobs for efficiency improvement in enterprise clusters , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[7]  Lei Ying,et al.  MapTask Scheduling in MapReduce With Data Locality: Throughput and Heavy-Traffic Optimality , 2013, IEEE/ACM Transactions on Networking.

[8]  Li Zha,et al.  Dynamic split model of resource utilization in MapReduce , 2011, DataCloud-SC '11.

[9]  Malgorzata Steinder,et al.  Performance-driven task co-scheduling for MapReduce environments , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[10]  Xiaoqiao Meng,et al.  Performance analysis of Coupling Scheduler for MapReduce/Hadoop , 2012, 2012 Proceedings IEEE INFOCOM.

[11]  Roy H. Campbell,et al.  ARIA: automatic resource inference and allocation for mapreduce environments , 2011, ICAC '11.

[12]  Mahmut T. Kandemir,et al.  MROrchestrator: A Fine-Grained Resource Orchestration Framework for MapReduce Clusters , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[13]  Xiaoqiao Meng,et al.  Coupling task progress for MapReduce resource-aware scheduling , 2013, 2013 Proceedings IEEE INFOCOM.

[14]  Raj Jain,et al.  A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems , 1998, ArXiv.

[15]  Roy H. Campbell,et al.  Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[16]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[17]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[18]  Xiaoqiao Meng,et al.  Delay tails in MapReduce scheduling , 2012, SIGMETRICS '12.

[19]  Lei Ying,et al.  Map task scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality , 2013, INFOCOM.