Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale

As clusters continue to grow in size and complexity, providing scalable and predictable performance is an increasingly important challenge. A crucial roadblock to achieving predictable performance is stragglers, i.e., tasks that take significantly longer than expected to run. At this point, speculative execution has been widely adopted to mitigate the impact of stragglers. However, speculation mechanisms are designed and operated independently of job scheduling when, in fact, scheduling a speculative copy of a task has a direct impact on the resources available for other jobs. In this work, we present Hopper, a job scheduler that is speculation-aware, i.e., that integrates the tradeoffs associated with speculation into job scheduling decisions. We implement both centralized and decentralized prototypes of the Hopper scheduler and show that 50% (66%) improvements over state-of-the-art centralized (decentralized) schedulers and speculation strategies can be achieved through the coordination of scheduling and speculation.

[1]  Weikuan Yu,et al.  Preemptive ReduceTask Scheduling for Fair and Fast Job Completion , 2013, ICAC.

[2]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[3]  Eshcar Hillel,et al.  Predicting Execution Bottlenecks in Map-Reduce Clusters , 2012, HotCloud.

[4]  Adam Wierman,et al.  This Paper Is Included in the Proceedings of the 11th Usenix Symposium on Networked Systems Design and Implementation (nsdi '14). Grass: Trimming Stragglers in Approximation Analytics Grass: Trimming Stragglers in Approximation Analytics , 2022 .

[5]  Yi Lu,et al.  Randomized load balancing with general service time distributions , 2010, SIGMETRICS '10.

[6]  Wei Lin,et al.  Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing , 2014, OSDI.

[7]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[8]  Scott Shenker,et al.  Choosy: max-min fair sharing for datacenter jobs with constraints , 2013, EuroSys '13.

[9]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[10]  Randy H. Katz,et al.  Wrangler: Predictable and Faster Jobs using Fewer Resources , 2014, SoCC.

[11]  Scott Shenker,et al.  Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .

[12]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2015, SIGCOMM.

[13]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[14]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[15]  Minghong Lin,et al.  Joint optimization of overlapping phases in MapReduce , 2013, Perform. Evaluation.

[16]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[17]  Kirk Pruhs,et al.  Online scheduling , 2003 .

[18]  Scott Shenker,et al.  The Case for Tiny Tasks in Compute Clusters , 2013, HotOS.

[19]  Hitesh Ballani,et al.  Decentralized task-aware scheduling for data center networks , 2015, SIGCOMM 2015.

[20]  Adam Wierman,et al.  Fairness and scheduling in single server queues , 2011 .

[21]  Kun-Lung Wu,et al.  FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads , 2010, Middleware.

[22]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[23]  Jeffrey Dean,et al.  Achieving Rapid Response Times in Large Online Services , 2012 .

[24]  Scott Shenker,et al.  Disk-Locality in Datacenter Computing Considered Irrelevant , 2011, HotOS.

[25]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[26]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[27]  Xiaoqi Ren Speculation-Aware Resource Allocation for Cluster Schedulers , 2015 .

[28]  Lei Ying,et al.  A throughput optimal algorithm for map task scheduling in mapreduce with data locality , 2013, PERV.

[29]  Mor Harchol-Balter,et al.  Size-based scheduling to improve web performance , 2003, TOCS.

[30]  Adam Wierman,et al.  Classifying scheduling policies with respect to unfairness in an M/GI/1 , 2003, SIGMETRICS '03.

[31]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[32]  Srikanth Kandula,et al.  Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters , 2016 .

[33]  Linus Schrage,et al.  Letter to the Editor - A Proof of the Optimality of the Shortest Remaining Processing Time Discipline , 1968, Oper. Res..

[34]  Adam Wierman,et al.  On the Impact of Heterogeneity and Back-End Scheduling in Load Balancing Designs , 2009, IEEE INFOCOM 2009.

[35]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[36]  Scott Shenker,et al.  Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[37]  Chita R. Das,et al.  Modeling and synthesizing task placement constraints in Google compute clusters , 2011, SoCC.

[38]  Anirban Dasgupta,et al.  On scheduling in map-reduce and flow-shops , 2011, SPAA '11.

[39]  Xiaoqiao Meng,et al.  Delay tails in MapReduce scheduling , 2012, SIGMETRICS '12.

[40]  Ramesh K. Sitaraman,et al.  The power of two random choices: a survey of tech-niques and results , 2001 .

[41]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[42]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[43]  Srikanth Kandula,et al.  PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.

[44]  Minghong Lin,et al.  Joint optimization of overlapping phases in MapReduce , 2013, PERV.

[45]  Matei Zaharia,et al.  Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[46]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.