heSRPT: Optimal Parallel Scheduling of Jobs With Known Sizes

When parallelizing a set of jobs across many servers, one must balance a trade-off between granting priority to short jobs and maintaining the overall efficiency of the system. When the goal is to minimize the mean flow time of a set of jobs, it is usually the case that one wants to complete short jobs before long jobs. However, since jobs usually cannot be parallelized with perfect efficiency, granting strict priority to the short jobs can result in very low system efficiency which in turn hurts the mean flow time across jobs. In this paper, we derive the optimal policy for allocating servers to jobs at every moment in time in order to minimize mean flow time across jobs. We assume that jobs follow a sublinear, concave speedup function, and hence jobs experience diminishing returns from being allocated additional servers. We show that the optimal policy, heSRPT, will complete jobs according to their size order, but maintains overall system efficiency by allocating some servers to each job at every moment in time. We compare heSRPT with state-of-the-art allocation policies from the literature and show that heSRPT outperforms its competitors by at least 30%, and often by much more.

[1]  Joseph Naor,et al.  Deadline-aware scheduling of big-data processing jobs , 2014, SPAA.

[2]  Benjamin Moseley,et al.  Scheduling Parallelizable Jobs Online to Maximize Throughput , 2018, LATIN.

[3]  Yuan Zhong,et al.  Minimizing the Total Weighted Completion Time of Coflows in Datacenter Networks , 2015, SPAA.

[4]  Xueyan Tang,et al.  Clairvoyant Dynamic Bin Packing for Job Scheduling with Minimum Server Usage Time , 2016, SPAA.

[5]  Deying Li,et al.  Minimizing makespan and total completion time in MapReduce-like systems , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[6]  Guy E. Blelloch,et al.  Space-efficient scheduling of nested parallelism , 1999, TOPL.

[7]  David B. Shmoys,et al.  Scheduling to Minimize Average Completion Time: Off-Line and On-Line Approximation Algorithms , 1997, Math. Oper. Res..

[8]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[9]  Fabián A. Chudak,et al.  Approximation algorithms for precedence-constrained scheduling problems on parallel machines that run at different speeds , 1997, SODA '97.

[10]  Evripidis Bampis,et al.  A note on multiprocessor speed scaling with precedence constraints , 2014, SPAA.

[11]  Joseph Y.-T. Leung,et al.  Complexity of Scheduling Parallel Task Systems , 1989, SIAM J. Discret. Math..

[12]  Benjamin Moseley,et al.  Scheduling Parallelizable Jobs Online to Minimize the Maximum Flow Time , 2016, SPAA.

[13]  Hamidreza Jahanjou,et al.  Asymptotically Optimal Approximation Algorithms for Coflow Scheduling , 2016, SPAA.

[14]  Javad Ghaderi,et al.  Brief Announcement: A New Improved Bound for Coflow Scheduling , 2017, SPAA.

[15]  Kirk Pruhs,et al.  Competitively Scheduling Tasks with Intermediate Parallelizability , 2016, ACM Trans. Parallel Comput..

[16]  Mor Harchol-Balter,et al.  Towards Optimality in Parallel Scheduling , 2017, Proc. ACM Meas. Anal. Comput. Syst..

[17]  Samir Khuller,et al.  Brief Announcement: Improved Approximation Algorithms for Scheduling Co-Flows , 2016, SPAA.

[18]  Vijaya Ramachandran,et al.  Oblivious algorithms for multicores and network of processors , 2010, IPDPS.

[19]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[20]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[21]  Ion Stoica,et al.  Efficient coflow scheduling with Varys , 2015, SIGCOMM.

[22]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[23]  Kai Li,et al.  PARSEC3.0: A Multicore Benchmark Suite with Network Stacks and SPLASH-2X , 2017, CARN.

[24]  Jeff Edmonds,et al.  Scheduling in the dark , 1999, STOC '99.

[25]  Mor Harchol-Balter,et al.  Towards Optimality in Parallel Job Scheduling , 2017, SIGMETRICS.

[26]  Cheng-Fu Chou,et al.  A Model-Based Approach to Streamlining Distributed Training for Asynchronous SGD , 2018, 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

[27]  Rajeev Motwani,et al.  Approximation techniques for average completion time scheduling , 1997, SODA '97.

[28]  Donald R. Smith Technical Note - A New Proof of the Optimality of the Shortest Remaining Processing Time Discipline , 1978, Oper. Res..

[29]  Kirk Pruhs,et al.  Scalably scheduling processes with arbitrary speedup curves , 2009, TALG.

[30]  Chi-Yeh Chen An Improved Approximation for Scheduling Malleable Tasks with Precedence Constraints via Iterative Method , 2018, IEEE Transactions on Parallel and Distributed Systems.