Cluster fair queueing: Speeding up data-parallel jobs with delay guarantees

Cluster scheduler serves as a critical component to data-parallel systems in datacenters. Ideally, a scheduler should provide predictable performance with guarantees on the maximal job completion delay, while at the same time ensuring the minimal mean response time. Practically however, performance predictability and optimality are often conflicting with each other. The results often are a plethora of scheduling policies that either achieve predictable performance at the expense of long response times (e.g., max-min fairness), or run the risk of starving some jobs to obtain the minimal mean response time (e.g., Shortest Remaining Processing Time First). To address these problems, we develop a new scheduler, Cluster Fair Queueing (CFQ), which preferentially offers resources to jobs that complete the earliest under a fair sharing policy. We show that CFQ is able to minimize the mean response time while at the same time ensuring jobs to finish within a constant time after their completion under fair sharing. Our Spark deployment on a 100-node EC2 cluster demonstrates that compared to the built-in fair scheduler, CFQ can decrease the mean response time by 40%, which speeds up more than 40% of jobs by over 75% on average.

[1]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[2]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM 1989.

[3]  Weikuan Yu,et al.  Preemptive ReduceTask Scheduling for Fair and Fast Job Completion , 2013, ICAC.

[4]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks: the single-node case , 1993, TNET.

[5]  Scott Shenker,et al.  Choosy: max-min fair sharing for datacenter jobs with constraints , 2013, EuroSys '13.

[6]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[7]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[8]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[9]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[10]  Banu Ozden,et al.  Fair queuing for aggregated multiple links , 2001, SIGCOMM 2001.

[11]  Anirban Dasgupta,et al.  On scheduling in map-reduce and flow-shops , 2011, SPAA '11.

[12]  Xiaoqiao Meng,et al.  Delay tails in MapReduce scheduling , 2012, SIGMETRICS '12.

[13]  Antony I. T. Rowstron,et al.  Bridging the tenant-provider gap in cloud services , 2012, SoCC '12.

[14]  Kirk Pruhs,et al.  Online scheduling , 2003 .

[15]  Kok-Kiong Yap,et al.  Multi-server generalized processor sharing , 2012, 2012 24th International Teletraffic Congress (ITC 24).

[16]  Linus Schrage,et al.  Letter to the Editor - A Proof of the Optimality of the Shortest Remaining Processing Time Discipline , 1968, Oper. Res..

[17]  Giorgio Buttazzo,et al.  Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications , 1997 .

[18]  David E. Culler,et al.  Hierarchical scheduling for diverse datacenter workloads , 2013, SoCC.

[19]  Minghong Lin,et al.  Joint optimization of overlapping phases in MapReduce , 2013, PERV.

[20]  Srikanth Kandula,et al.  Jockey: guaranteed job latency in data parallel clusters , 2012, EuroSys '12.

[21]  Bingsheng He,et al.  Wave Computing in the Cloud , 2009, HotOS.

[22]  Eric J. Friedman,et al.  Fairness and efficiency in web server protocols , 2003, SIGMETRICS '03.

[23]  Roy H. Campbell,et al.  ARIA: automatic resource inference and allocation for mapreduce environments , 2011, ICAC '11.

[24]  Joseph Y.-T. Leung,et al.  Handbook of Scheduling: Algorithms, Models, and Performance Analysis , 2004 .

[25]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[26]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[27]  Pietro Michiardi,et al.  HFSP: Bringing Size-Based Scheduling To Hadoop , 2017, IEEE Transactions on Cloud Computing.