An Evaluation of Parallel Job Scheduling for ASCI Blue-Pacific

In this paper we analyze the behavior of a gang-scheduling system that we are developing for the ASCI Blue-Pacific machines. Starting with a real workload obtained from job logs of one of the ASCI machines, we generate a statistical model of this workload using Hyper Erlang distributions. We then vary the parameters of those distributions to generate various workloads, representative of different operating points of the machine. Through simulation we obtain performance characteristics for three different scheduling strategies: (i) first-come first-serve, (ii) gang-scheduling, and (iii) backfilling. Our results show that both backfilling and gang-scheduling with moderate multiprogramming levels are much more effective than simple first-come first-serve scheduling. In addition, we show that gang-scheduling can display better performance characteristics than backfilling, particularly for large production jobs.

[1]  Dror G. Feitelson,et al.  Improved Utilization and Responsiveness with Gang Scheduling , 1997, JSSPP.

[2]  Randolph D. Nelson,et al.  Probability, stochastic processes, and queueing theory - the mathematics of computer performance modeling , 1995 .

[3]  Honbo Zhou,et al.  The EASY - LoadLeveler API Project , 1996, JSSPP.

[4]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[5]  Richard Wolski,et al.  Time Sharing Massively Parallel Machines , 1995, ICPP.

[6]  Uwe Schwiegelshohn,et al.  Improving First-Come-First-Serve Job Scheduling by Gang Scheduling , 1998, JSSPP.

[7]  Randolph Nelson,et al.  Probability, Stochastic Processes, and Queueing Theory , 1995 .

[8]  Liana L. Fong,et al.  A Gang-Scheduling System for ASCI Blue-Pacific , 1999, HPCN Europe.

[9]  Dror G. Feitelson,et al.  Utilization and Predictability in Scheduling the IBM SP2 with Backfilling , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[10]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[11]  Fang Wang,et al.  Modeling of Workload in MPPs , 1997, JSSPP.

[12]  Michael A. Johnson,et al.  Matching moments to phase distri-butions: mixtures of Erlang distribution of common order , 1989 .

[13]  Liana L. Fong,et al.  An Infrastructure for Efficient Parallel Job Execution in Terascale Computing Environments , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[14]  Mark S. Squillante,et al.  Extensible resource management for cluster computing , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[15]  L. Rudolph,et al.  Gang scheduling for highly efficient, distributed multiprocessor systems , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[16]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[17]  James Patton Jones,et al.  Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization , 1999, JSSPP.