Simulation based HPC workload analysis

Before implementing scheduling policies (i.e. job prioritization) on a system, it is imperative that their effects on performance be understood. Changing policies without this knowledge may result in issues such as job starvation, increased queue time, and decreased system utilization. This paper proposes a means of reproducibly and accurately determining the true impact of changes in scheduling policy, resource configuration, and workload distribution. The proposed solution, the Maui Scheduler possesses an advanced, easy-to-use, integrated simulator capable of simulating and producing statistics to analyze the impact of an immense array of real world system configurations, policy sets, and workloads. This paper describes the capabilities and use of Maui's internal simulator and demonstrates these capabilities by way of a number of real world examples.

[1]  Allen B. Downey Predicting queue times on space-sharing parallel computers , 1997, Proceedings 11th International Parallel Processing Symposium.

[2]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[3]  Thomas R. Gross,et al.  Impact of Job Mix on Optimizations for Space Sharing Schedulers , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[4]  Helen D. Karatza Simulation Study of Task Scheduling and Resequencing in a Multiprocessing System , 1997, Simul..

[5]  Dror G. Feitelson,et al.  Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860 , 1995, JSSPP.

[6]  Mark J. Clement,et al.  The Performance Impact of Advance Reservation Meta-scheduling , 2000, JSSPP.

[7]  Allen B. Downey,et al.  The elusive goal of workload characterization , 1999, PERV.

[8]  V. Kumar,et al.  Job Scheduling in the presence of Multiple Resource Requirements , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[9]  Susan J. Eggers,et al.  On the validity of trace-driven simulation for multiprocessors , 1991, ISCA '91.

[10]  Dror G. Feitelson,et al.  Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[11]  David A. Wood,et al.  A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.