Workload Characteristics of a Multi-cluster Supercomputer

This paper presents a comprehensive characterization of a multi-cluster supercomputer workload using twelve-month scientific research traces. Metrics that we characterize include system utilization, job arrival rate and interarrival time, job cancellation rate, job size (degree of parallelism), job runtime, memory usage, and user/group behavior. Correlations between metrics (job runtime and memory usage, requested and actual runtime, etc) are identified and extensively studied. Differences with previously reported workloads are recognized and statistical distributions are fitted for generating synthetic workloads with the same characteristics. This study provides a realistic basis for experiments in resource management and evaluations of different scheduling strategies in a multi-cluster research environment.

[1]  Arnold O. Allen,et al.  Probability, statistics and queueing theory - with computer science applications (2. ed.) , 1981, Int. CMG Conference.

[2]  Giuseppe Serazzi,et al.  Workload characterization: a survey , 1993, Proc. IEEE.

[3]  Giuseppe Serazzi,et al.  Construction and Use of Multiclass Workload Models , 1994, Perform. Evaluation.

[4]  Dror G. Feitelson,et al.  Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860 , 1995, JSSPP.

[5]  Bill Nitzberg,et al.  A comparison of workload traces from two production parallel machines , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[6]  Dror G. Feitelson,et al.  Memory Usage in the LANL CM-5 Workload , 1997, JSSPP.

[7]  Allen B. Downey,et al.  Using Queue Time Predictions for Processor Allocation , 1997, JSSPP.

[8]  Fang Wang,et al.  Modeling of Workload in MPPs , 1997, JSSPP.

[9]  Allen B. Downey,et al.  The elusive goal of workload characterization , 1999, PERV.

[10]  Warren Smith,et al.  Benchmarks and Standards for the Evaluation of Parallel Job Schedulers , 1999, JSSPP.

[11]  Mary K. Vernon,et al.  Characteristics of a Large Shared Memory Production Workload , 2001, JSSPP.

[12]  Francine Berman,et al.  A comprehensive model of the supercomputer workload , 2001 .

[13]  Dror G. Feitelson,et al.  Workload Modeling for Performance Evaluation , 2002, Performance.

[14]  Ramin Sadre,et al.  Fitting World Wide Web request traces with the EM-algorithim , 2001, SPIE ITCom.

[15]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[16]  Anca I. D. Bucur,et al.  A Measurement-Based Simulation Study of Processor Co-allocation in Multicluster Systems , 2003, JSSPP.