Parallel Job Scheduling under Dynamic Workloads

Jobs that run on parallel systems that use gang scheduling for multiprogramming may interact with each other in various ways. These interactions are affected by system parameters such as the level of multiprogramming and the scheduling time quantum. A careful evaluation is therefore required in order to find parameter values that lead to optimal performance. We perform a detailed performance evaluation of three factors affecting scheduling systems running dynamic workloads: multiprogramming level, time quantum, and the use of backfilling for queue management — and how they depend on offered load. Our evaluation is based on synthetic MPI applications running on a real cluster that actually implements the various scheduling schemes. Our results demonstrate the importance of both components of the gang-scheduling plus backfilling combination: gang scheduling reduces response time and slowdown, and backfilling allows doing so with a limited multiprogramming level. This is further improved by using flexible coscheduling rather than strict gang scheduling, as this reduces the constraints and allows for a denser packing.

[1]  D J Evans,et al.  Parallel processing , 1986 .

[2]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[3]  Anoop Gupta,et al.  The impact of operating system scheduling policies and synchronization methods of performance of parallel applications , 1991, SIGMETRICS '91.

[4]  J. van Leeuwen,et al.  Job Scheduling Strategies for Parallel Processing , 2003, Lecture Notes in Computer Science.

[5]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[6]  Dan Tsafrir,et al.  Effects of clock resolution on the scheduling of interactive and soft real-time processes , 2003, SIGMETRICS '03.

[7]  Dror G. Feitelson,et al.  Flexible coscheduling: mitigating load imbalance and improving utilization of heterogeneous resources , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[8]  Larry Rudolph,et al.  Metrics and Benchmarking for Parallel Job Scheduling , 1998, JSSPP.

[9]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[10]  Adi Raveh,et al.  Comparing Logs and Models of Parallel Workloads Using the Co-plot Method , 1999, JSSPP.

[11]  Dror G. Feitelson,et al.  Gang scheduling with memory considerations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[12]  Andrea C. Arpaci-Dusseau,et al.  Implicit coscheduling: coordinated scheduling with implicit information in distributed systems , 2001, TOCS.

[13]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[14]  Scott Pakin,et al.  STORM: Lightning-Fast Resource Management , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[15]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[16]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[17]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[18]  G ValiantLeslie A bridging model for parallel computation , 1990 .

[19]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[20]  Larry Rudolph,et al.  Gang Scheduling Performance Benefits for Fine-Grain Synchronization , 1992, J. Parallel Distributed Comput..

[21]  Anand Sivasubramaniam,et al.  Improving parallel job scheduling by combining gang scheduling and backfilling techniques , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[22]  Dror G. Feitelson,et al.  The Forgotten Factor: Facts on Performance Evaluation and Its Dependence on Workloads , 2002, Euro-Par.

[23]  Liana L. Fong,et al.  An Infrastructure for Efficient Parallel Job Execution in Terascale Computing Environments , 1998, Proceedings of the IEEE/ACM SC98 Conference.