COSHH: A classification and optimization based scheduler for heterogeneous Hadoop systems

A Hadoop system provides execution and multiplexing of many tasks in a common datacenter. There is a rising demand for sharing Hadoop clusters amongst various users, which leads to increasing system heterogeneity. However, heterogeneity is a neglected issue in most Hadoop schedulers. In this work we design and implement a new Hadoop scheduling system, named COSHH, which considers heterogeneity at both the application and cluster levels. The main objective of COSHH is to improve the mean completion time of jobs. However, as it is concerned with other key Hadoop performance metrics, our proposed scheduler also achieves competitive performance under minimum share satisfaction, fairness and locality metrics with respect to other well-known Hadoop schedulers.

[1]  Alexandru Iosup,et al.  The performance of bags-of-tasks in large-scale distributed systems , 2008, HPDC '08.

[2]  Keith Phalp,et al.  An investigation of machine learning based prediction systems , 2000, J. Syst. Softw..

[3]  Thomas Sandholm,et al.  Dynamic Proportional Share Scheduling in Hadoop , 2010, JSSPP.

[4]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[5]  Maozhen Li,et al.  MRSim: A discrete event based MapReduce simulator , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[6]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[7]  Magdalena Balazinska,et al.  ParaTimer: a progress indicator for MapReduce DAGs , 2010, SIGMOD Conference.

[8]  Oskooei Aysan Rasooli Improving Scheduling in Heterogeneous Grid and Hadoop Systems , 2013 .

[9]  Yoichi Muraoka,et al.  Extended forecast of CPU and network load on computational Grid , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[10]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[11]  Ethem Alpaydin,et al.  Introduction to Machine Learning (Adaptive Computation and Machine Learning) , 2004 .

[12]  Matei Zaharia,et al.  Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[13]  Archana Ganapathi,et al.  The Case for Evaluating MapReduce Performance Using Workload Suites , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.