Adaptive Scheduling with Parallelism Feedback

Multiprocessor scheduling in a shared multiprogramming environment can be structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler schedules the work of a job on the allotted processors. In this context, the number of processors allotted to a particular job may vary during the job's execution, and the thread scheduler must adapt to these changes in processor resources. For overall system efficiency, the thread scheduler should also provide parallelism feedback to the job scheduler to avoid allotting a job more processors than it can use productively. This paper provides an overview of several adaptive thread schedulers we have developed that provide provably good history-based feedback about the job's parallelism without knowing the future of the job. These thread schedulers complete the job in near-optimal time while guaranteeing low waste. We have analyzed these thread schedulers under stringent adversarial conditions, showing that the thread schedulers are robust to various system environments and allocation policies. To analyze the thread schedulers under this adversarial model, we have developed a new technique, called trim analysis, which can be used to show that the thread scheduler provides good behavior on the vast majority of time steps, and performs poorly on only a few. When our thread schedulers are used with dynamic equipartitioning and other related job scheduling algorithms, they are O(1)-competitive against an optimal offline scheduling algorithm with respect to both mean response time and makespan for batched jobs and nonbatched jobs, respectively. Our algorithms are the first nonclairvoy-ant scheduling algorithms to offer such guarantees.

[1]  Joel H. Saltz,et al.  Runtime Support for Programming in Adaptive Parallel Environments , 1995, LCR.

[2]  Robert D. Blumofe,et al.  Scheduling large-scale parallel computations on networks of workstations , 1994, Proceedings of 3rd IEEE International Symposium on High Performance Distributed Computing.

[3]  Xiaotie Deng,et al.  Preemptive Scheduling of Parallel Jobs on Multiprocessors , 1996, SIAM J. Comput..

[4]  Robert D. Blumofe,et al.  Hood: A user-level threads library for multiprogrammed multiprocessors , 1998 .

[5]  Peiyi Tang,et al.  Dynamic Processor Self-Scheduling for General Parallel Nested Loops , 1987, IEEE Trans. Computers.

[6]  F. Warren Burton,et al.  Executing functional programs on a virtual tree of processors , 1981, FPCA '81.

[7]  Yuxiong He,et al.  Provably Efficient Online Nonclairvoyant Adaptive Scheduling , 2008, IEEE Trans. Parallel Distributed Syst..

[8]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[9]  Charles E. Leiserson,et al.  Space-efficient scheduling of multithreaded computations , 1993, SIAM J. Comput..

[10]  Xiaotie Deng,et al.  On Multiprocessor System Scheduling , 1996, SPAA '96.

[11]  Kasper Østerbye,et al.  BetaSIM: A framework for discrete event modelling and simulation , 1998, Simul. Pract. Theory.

[12]  Jianer Chen,et al.  A polynomial time approximation scheme for general multiprocessor job scheduling (extended abstract) , 1999, STOC '99.

[13]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[14]  Giuseppe Serazzi,et al.  Robust Partitioning Policies of Multiprocessor Systems , 1994, Perform. Evaluation.

[15]  Eli Upfal,et al.  A simple load balancing scheme for task allocation in parallel machines , 1991, SPAA '91.

[16]  Larry Rudolph,et al.  Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors , 1983, TOPL.

[17]  Klaus Jansen,et al.  Linear-Time Approximation Schemes for Scheduling Malleable Parallel Tasks , 1999, SODA '99.

[18]  Yuxiong He,et al.  Work Stealing with Parallelism Feedback , 2008 .

[19]  V. K. Naik,et al.  Performance analysis of job scheduling policies in parallel supercomputing environments , 1993, Supercomputing '93.

[20]  Xiaotie Deng,et al.  Competitive dynamic multiprocessor allocation for parallel applications , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[21]  Philip S. Yu,et al.  Smart SMART Bounds for Weighted Response Time Scheduling , 1999, SIAM J. Comput..

[22]  Joel H. Saltz,et al.  Data parallel programming in an adaptive environment , 1995, Proceedings of 9th International Parallel Processing Symposium.

[23]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[24]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[25]  Robert D. Blumofe,et al.  Adaptive and Reliable ParallelComputing9 Networks of Workstations , 1997 .

[26]  Tim Brecht,et al.  Using Parallel Program Characteristics in Dynamic Processor Allocation Policies , 1996, Perform. Evaluation.

[27]  Guy E. Blelloch,et al.  A provable time and space efficient implementation of NESL , 1996, ICFP '96.

[28]  Guy E. Blelloch,et al.  Provably efficient scheduling for languages with fine-grained parallelism , 1995, SPAA '95.

[29]  Robert D. Blumofe,et al.  Executing multithreaded programs efficiently , 1995 .

[30]  E BlellochGuy,et al.  Implementation of a portable nested data-parallel language , 1993 .

[31]  Weng-Fai Wong,et al.  SilkRoad: a multithreaded runtime system with software distributed shared memory for SMP clusters , 2000, Proceedings IEEE International Conference on Cluster Computing. CLUSTER 2000.

[32]  Robert H. Halstead,et al.  Implementation of multilisp: Lisp on a multiprocessor , 1984, LFP '84.

[33]  Klaus Jansen,et al.  Scheduling malleable tasks with precedence constraints , 2005, SPAA '05.

[34]  Guy E. Blelloch,et al.  Space-efficient scheduling of nested parallelism , 1999, TOPL.

[35]  Dan Suciu,et al.  Efficient compilation of high-level data parallel algorithms , 1994, SPAA '94.

[36]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[37]  Denis Trystram,et al.  Efficient approximation algorithms for scheduling malleable tasks , 1999, SPAA '99.

[38]  Edward D. Lazowska,et al.  Speedup Versus Efficiency in Parallel Systems , 1989, IEEE Trans. Computers.

[39]  Kasper Østerbye,et al.  A Framework for Discrete Event Modelling & Simulation , 2002 .

[40]  Eleftherios D. Polychronopoulos,et al.  A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU MANAGER , 2000, JSSPP.

[41]  Bin Song,et al.  Scheduling Adaptively Parallel Jobs , 1998 .

[42]  Robert H. Halstead,et al.  Mul-T: a high-performance parallel Lisp , 1989, PLDI '89.

[43]  Edward G. Coffman,et al.  Scheduling independent tasks to reduce mean finishing time , 1974, CACM.

[44]  Jeff Edmonds,et al.  Scheduling in the dark , 1999, STOC '99.

[45]  Guy E. Blelloch,et al.  Implementation of a portable nested data-parallel language , 1993, PPOPP '93.

[46]  Xiaotie Deng,et al.  Non-Clairvoyant Multiprocessor Scheduling of Jobs with Changing Execution Characteristics , 2003, J. Sched..

[47]  E BlellochGuy,et al.  Space-efficient scheduling of nested parallelism , 1999 .

[48]  Tim J. Harris,et al.  A survey of PRAM simulation techniques , 1994, CSUR.

[49]  Rajeev Motwani,et al.  Non-clairvoyant scheduling , 1994, SODA '93.

[50]  Mark S. Squillante,et al.  On the Benefits and Limitations of Dynamic Partitioning in Parallel Computer Systems , 1995, JSSPP.

[51]  Yossi Matias,et al.  Fast and Efficient Simulations among CRCW PRAMs , 1994, J. Parallel Distributed Comput..

[52]  Raj Vaswani,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993, TOCS.

[53]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[54]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[55]  Philip S. Yu,et al.  Scheduling parallelizable tasks to minimize average response time , 1994, SPAA '94.

[56]  Giuseppe Serazzi,et al.  Analysis of Non-Work-Conserving Processor Partitioning Policies , 1995, JSSPP.

[57]  Mary K. Vernon,et al.  The performance of multiprogrammed multiprocessor scheduling algorithms , 1990, SIGMETRICS '90.

[58]  Dror G. Feitelson,et al.  Job Scheduling in Multiprogrammed Parallel Systems , 1997 .

[59]  Edith Schonberg,et al.  Low-overhead scheduling of nested parallelism , 1991, IBM J. Res. Dev..

[60]  Yuxiong He,et al.  An empirical evaluation of work stealing with parallelism feedback , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[61]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[62]  Amitabh Sinha,et al.  Non-Clairvoyant Scheduling for Minimizing Mean Slowdown , 2003, Algorithmica.

[63]  Nian Gu Competitive Analysis of Dynamic Multiprocessor Allocation Strategies , 1995 .

[64]  Prasoon Tiwari,et al.  Scheduling malleable and nonmalleable parallel tasks , 1994, SODA '94.

[65]  Shikharesh Majumdar,et al.  Scheduling in multiprogrammed parallel systems , 1988, SIGMETRICS '88.

[66]  Siddhartha Sen,et al.  Dynamic Processor Allocation for Adaptively Parallel Work-Stealing Jobs , 2004 .

[67]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[68]  John R. Anderson,et al.  Essential Lisp , 1986 .

[69]  David J. Lilja,et al.  Implementing a dynamic processor allocation policy for multiprogrammed parallel applications in the SolarisTM , 2001, Concurr. Comput. Pract. Exp..

[70]  Bradford L. Chamberlain,et al.  ZPL: A Machine Independent Programming Language for Parallel Computers , 2000, IEEE Trans. Software Eng..