Predicting queue times on space-sharing parallel computers

We present statistical techniques for predicting the queue times experienced by jobs submitted to a space-sharing parallel machine with first-come-first-served (FCFS) scheduling. We apply these techniques to trace data from the Intel Paragon at the San Diego Supercomputer Center and the IBM SP2 at the Cornell Theory Center. We show that it is possible to predict queue times with accuracy that is acceptable for several intended applications. The coefficient of correlation between our predicted queue times and the actual queue times from simulated schedules is between 0.65 and 0.72.

[1]  Steven Hotovy,et al.  Workload Evolution on the Cornell Theory Center IBM SP2 , 1996, JSSPP.

[2]  Teunis J. Ott,et al.  Load-balancing heuristics and process behavior , 1986, SIGMETRICS '86/PERFORMANCE '86.

[3]  Bill Nitzberg,et al.  A comparison of workload traces from two production parallel machines , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[4]  S. F. Actory,et al.  Personal correspondence , 1997 .

[5]  Kenneth C. Sevcik,et al.  Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems , 1994, Perform. Evaluation.

[6]  Allen B. Downey,et al.  Using Queue Time Predictions for Processor Allocation , 1997, JSSPP.

[7]  Dan C. Marinescu,et al.  Models and Algorithms for Coscheduling Compute-Intensive Tasks on a Network of Workstations , 1992, J. Parallel Distributed Comput..

[8]  Mor Harchol-Balter,et al.  Exploiting process lifetime distributions for dynamic load balancing , 1995, SIGMETRICS.

[9]  Ravishankar K. Iyer,et al.  Predictability of Process Resource Usage: A Measurement-Based Study on UNIX , 1989, IEEE Trans. Software Eng..

[10]  Reagan Moore,et al.  A Batch Scheduler for the Intel Paragon MPP System with a Non-contiguous Node Allocation Algorithm , 1996, JSSPP.

[11]  Dror G. Feitelson,et al.  Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860 , 1995, JSSPP.

[12]  Timothy J. O'Donnell,et al.  Analysis of the early workload on the Cornell Theory Center IBM SP2 , 1996, SIGMETRICS '96.

[13]  Harchol-BalterMor,et al.  Exploiting process lifetime distributions for dynamic load balancing , 1997 .

[14]  James C. French,et al.  A Synopsis of the Legion Project , 1994 .

[15]  Allen B. Downey,et al.  A Model For Speedup of Parallel Programs , 1997 .