Towards Optimality in Parallel Scheduling

To keep pace with Moore's law, chip designers have focused on increasing the number of cores per chip rather than single core performance. In turn, modern jobs are often designed to run on any number of cores. However, to effectively leverage these multi-core chips, one must address the question of how many cores to assign to each job. Given that jobs receive sublinear speedups from additional cores, there is an obvious tradeoff: allocating more cores to an individual job reduces the job's runtime, but in turn decreases the efficiency of the overall system. We ask how the system should schedule jobs across cores so as to minimize the mean response time over a stream of incoming jobs. To answer this question, we develop an analytical model of jobs running on a multi-core machine. We prove that EQUI, a policy which continuously divides cores evenly across jobs, is optimal when all jobs follow a single speedup curve and have exponentially distributed sizes. EQUI requires jobs to change their level of parallelization while they run. Since this is not possible for all workloads, we consider a class of "fixed-width" policies, which choose a single level of parallelization, k, to use for all jobs. We prove that, surprisingly, it is possible to achieve EQUI's performance without requiring jobs to change their levels of parallelization by using the optimal fixed level of parallelization, k*. We also show how to analytically derive the optimal k* as a function of the system load, the speedup curve, and the job size distribution. In the case where jobs may follow different speedup curves, finding a good scheduling policy is even more challenging. In particular, we find that policies like EQUI which performed well in the case of a single speedup function now perform poorly. We propose a very simple policy, GREEDY*, which performs near-optimally when compared to the numerically-derived optimal policy.

[1]  S. Lippman Semi-Markov Decision Processes with Unbounded Rewards , 1973 .

[2]  K. Mani Chandy,et al.  Open, Closed, and Mixed Networks of Queues with Different Classes of Customers , 1975, JACM.

[3]  G. J. A. Stern,et al.  Queueing Systems, Volume 2: Computer Applications , 1976 .

[4]  Robert B. Cooper,et al.  Queueing systems, volume II: computer applications : By Leonard Kleinrock. Wiley-Interscience, New York, 1976, xx + 549 pp. , 1977 .

[5]  Randolph D. Nelson,et al.  An Approximation for the Mean Response Time for Shortest Queue Routing with General Inerarrival and Service Times , 1993, Perform. Evaluation.

[6]  Ivo J. B. F. Adan,et al.  Upper and lower bounds for the waiting time in the symmetric shortest queue system , 1994, Ann. Oper. Res..

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[9]  Stergios V. Anastasiadis,et al.  Parallel Application Scheduling on Networks of Workstations , 1997, J. Parallel Distributed Comput..

[10]  Jeff Edmonds,et al.  Scheduling in the dark , 1999, STOC '99.

[11]  Alexandre Proutière,et al.  Insensitivity in processor-sharing networks , 2002, Perform. Evaluation.

[12]  Francine Berman,et al.  Using Moldability to Improve the Performance of Supercomputer Jobs , 2002, J. Parallel Distributed Comput..

[13]  P. Sadayappan,et al.  A Robust Scheduling Strategy for Moldable Scheduling of Parallel Jobs. , 2003 .

[14]  R. Serfozo,et al.  Response times in M/M/s fork-join networks , 2004, Advances in Applied Probability.

[15]  Ger Koole,et al.  Monotonicity in Markov Reward and Decision Chains: Theory and Applications , 2007, Found. Trends Stoch. Syst..

[16]  W. Whitt,et al.  Analysis of join-the-shortest-queue routing for web server farms , 2007, Perform. Evaluation.

[17]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[18]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[20]  Anand Sivasubramaniam,et al.  QDSL: a queuing model for systems with differential service levels , 2008, SIGMETRICS '08.

[21]  Mor Harchol-Balter,et al.  Optimal power allocation in server farms , 2009, SIGMETRICS '09.

[22]  Alan Scheller-Wolf,et al.  Surprising results on task assignment in server farms with high-variability workloads , 2009, SIGMETRICS '09.

[23]  John N. Tsitsiklis,et al.  On the power of (even a little) centralization in distributed processing , 2011, SIGMETRICS '11.

[24]  James R. Larus,et al.  Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services , 2011, Perform. Evaluation.

[25]  Ana Busic,et al.  Comparing Markov Chains: Aggregation and Precedence Relations Applied to Sets of States, with Applications to Assemble-to-Order Systems , 2012, Math. Oper. Res..

[26]  Kirk Pruhs,et al.  Scalably scheduling processes with arbitrary speedup curves , 2009, TALG.

[27]  Kuo-Chan Huang,et al.  Effective Processor Allocation for Moldable Jobs with Application Speedup Model , 2013 .

[28]  Mor Harchol-Balter,et al.  Performance Modeling and Design of Computer Systems: Queueing Theory in Action , 2013 .

[29]  Adam Wierman,et al.  This Paper Is Included in the Proceedings of the 11th Usenix Symposium on Networked Systems Design and Implementation (nsdi '14). Grass: Trimming Stragglers in Approximation Analytics Grass: Trimming Stragglers in Approximation Analytics , 2022 .

[30]  Adam Wierman,et al.  Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale , 2015, SIGCOMM.

[31]  Alan Scheller-Wolf,et al.  The Benefit of Introducing Variability in Single-Server Queues with Application to Quality-Based Service Domains , 2015, Oper. Res..

[32]  De Giusti,et al.  Structured Parallel Programming: patterns for efficient computation , 2015 .

[33]  Benjamin Moseley,et al.  Scheduling Parallelizable Jobs Online to Minimize the Maximum Flow Time , 2016, SPAA.

[34]  Kai Li,et al.  PARSEC3.0: A Multicore Benchmark Suite with Network Stacks and SPLASH-2X , 2017, CARN.

[35]  Guy E. Blelloch,et al.  Optimally Scheduling Jobs with Multiple Tasks , 2017, PERV.