Metrics for Parallel Job Scheduling and Their Convergence

The arrival process of jobs submitted to a parallel system is bursty, leading to fluctuations in the load at many time scales. In particular, rare events of extreme load may occur. Such events lead to an increase in the standard deviation of performance metrics, and thus delay the convergence of simulations used to evaluate the scheduling. Different performance metrics have been proposed in an effort to reduce this variability, and indeed display different rates of convergence. However, there is no single metric that outperforms the others under all conditions. Rather, the convergence of different metrics depends on the system being studied.

[1]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[2]  Dmitry N. Zotkin,et al.  Job-length estimation and performance in backfilling schedulers , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[3]  Teunis J. Ott,et al.  Load-balancing heuristics and process behavior , 1986, SIGMETRICS '86/PERFORMANCE '86.

[4]  Dror G. Feitelson,et al.  Packing Schemes for Gang Scheduling , 1996, JSSPP.

[5]  Francine Berman,et al.  Adaptive Selection of Partition Size for Supercomputer Requests , 2000, JSSPP.

[6]  G Kotsis Workload Models for Parallel Computer Systems , 1994 .

[7]  Per Brinch Hansen An Analysis of Response Ratio Scheduling , 1971, IFIP Congress.

[8]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[9]  Edward D. Lazowska,et al.  The limited performance benefits of migrating active processes for load sharing , 1988, SIGMETRICS '88.

[10]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[11]  M. H. MacDougall Simulating computer systems: techniques and tools , 1989 .

[12]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[13]  Krzysztof Pawlikowski,et al.  Steady-state simulation of queueing processes: survey of problems and solutions , 1990, CSUR.

[14]  Mor Harchol-Balter,et al.  Exploiting process lifetime distributions for dynamic load balancing , 1995, SIGMETRICS.

[15]  Niv Ahituv,et al.  SPEC as a Performance Evaluation Measure , 1995, Computer.

[16]  Allen B. Downey,et al.  The elusive goal of workload characterization , 1999, PERV.

[17]  Fang Wang,et al.  Modeling of Workload in MPPs , 1997, JSSPP.

[18]  Lester Lipsky,et al.  Long-lasting transient conditions in simulations with heavy-tailed workloads , 1997, WSC '97.

[19]  Philip Heidelberger,et al.  Fast simulation of rare events in queueing and reliability models , 1993, TOMC.

[20]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[21]  Robert F. Rosin Determining a computing center environment , 1965, CACM.

[22]  Mor Harchol-Balter,et al.  On Choosing a Task Assignment Policy for a Distributed Server System , 1998, J. Parallel Distributed Comput..

[23]  Allen B. Downey,et al.  A parallel workload model and its implications for processor allocation , 1996, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).