Improved Utilization and Responsiveness with Gang Scheduling

Most commercial multicomputers use space-slicing schemes in which each scheduling decision has an unknown impact on the future: should a job be scheduled, risking that it will block other larger jobs later, or should the processors be left idle for now in anticipation of future arrivals? This dilemma is solved by using gang scheduling, because then the impact of each decision is limited to its time slice, and future arrivals can be accommodated in other time slices. This added flexibility is shown to improve overall system utilization and responsiveness. Empirical evidence from using gang scheduling on a Cray T3D installed at Lawrence Livermore National Lab corroborates these results, and shows conclusively that gang scheduling can be very effective with current technology.

[1]  Kenneth C. Sevcik Characterizations of parallelism in applications and their use in scheduling , 1989, SIGMETRICS '89.

[2]  Robert L. Henderson,et al.  Job Scheduling Under the Portable Batch System , 1995, JSSPP.

[3]  Brent Gorda,et al.  Gang scheduling a parallel machine , 1991 .

[4]  Richard Wolski,et al.  Time Sharing Massively Parallel Machines , 1995, ICPP.

[5]  Kenneth C. Sevcik,et al.  Multiprocessor Scheduling for High-Variability Service Time Distributions , 1995, JSSPP.

[6]  Keqin Li,et al.  A Two-Dimensional Buddy System for Dynamic Resource Allocation in a Partitionable Mesh Connected System , 1991, J. Parallel Distributed Comput..

[7]  Kam-Hoi Cheng,et al.  A two dimensional buddy system for dynamic resource allocation in a partitionable mesh connected system , 1990, CSC '90.

[8]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[9]  Larry Rudolph,et al.  Evaluation of Design Choices for Gang Scheduling Using Distributed Hierarchical Control , 1996, J. Parallel Distributed Comput..

[10]  Renaud C. Regis,et al.  Multiserver Queueing Models of Multiprocessing Systems , 1973, IEEE Transactions on Computers.

[11]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[12]  Kenneth C. Sevcik,et al.  Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems , 1994, Perform. Evaluation.

[13]  Larry Rudolph,et al.  Gang Scheduling Performance Benefits for Fine-Grain Synchronization , 1992, J. Parallel Distributed Comput..

[14]  Giuseppe Serazzi,et al.  Analysis of Non-Work-Conserving Processor Partitioning Policies , 1995, JSSPP.

[15]  Dhiraj K. Pradhan,et al.  Job Scheduling in Mesh Multicomputers , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[16]  ZahorjanJohn,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993 .

[17]  Giuseppe Serazzi,et al.  Robust Partitioning Policies of Multiprocessor Systems , 1994, Perform. Evaluation.

[18]  Qing Yang,et al.  A New Graph Approach to Minimizing Processor Fragmentation in Hypercube Multiprocessors , 1993, IEEE Trans. Parallel Distributed Syst..

[19]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[20]  Larry Rudolph,et al.  Parallel Job Scheduling: Issues and Approaches , 1995, JSSPP.

[21]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[22]  Larry Rudolph,et al.  Distributed hierarchical control for parallel processing , 1990, Computer.

[23]  Dror G. Feitelson,et al.  Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860 , 1995, JSSPP.

[24]  Simon Kahan,et al.  Scheduling on the Tera MTA , 1995, JSSPP.

[25]  Reagan Moore,et al.  A Batch Scheduler for the Intel Paragon MPP System with a Non-contiguous Node Allocation Algorithm , 1996, JSSPP.

[26]  Bettina Schnor Dynamic Scheduling of Parallel Applications , 1995, PaCT.

[27]  Dror G. Feitelson,et al.  Packing Schemes for Gang Scheduling , 1996, JSSPP.

[28]  John Zahorjan,et al.  Scheduling memory constrained jobs on distributed memory parallel computers , 1995, SIGMETRICS '95/PERFORMANCE '95.

[29]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[30]  Steven Hotovy,et al.  Workload Evolution on the Cornell Theory Center IBM SP2 , 1996, JSSPP.

[31]  Raj Vaswani,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993, TOCS.

[32]  Bill Nitzberg,et al.  A comparison of workload traces from two production parallel machines , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).