Comprehensive Workload Analysis and Modeling of a Petascale Supercomputer

The performance of supercomputer schedulers is greatly affected by the characteristics of the workload it serves. A good understanding of workload characteristics is always important to develop and evaluate different scheduling strategies for an HPC system. In this paper, we present a comprehensive analysis of the workload characteristics of Kraken, the world’s fastest academic supercomputer and 11th on the latest Top500 list, with 112,896 compute cores and peak performance of 1.17 petaflops. In this study, we use twelve-month workload traces gathered on the system, which include around 700 thousand jobs submitted by more than one thousand users from 25 research areas. We investigate three categories of the workload characteristics: 1) general characteristics, including distribution of jobs over research fields and different queues, distribution of job size for an individual user, job cancellation rate, job termination rate, and walltime request accuracy; 2) temporal characteristics, including monthly machine utilization, job temporal distributions for different time periods, job inter-arrival time between temporally adjacent jobs and jobs submitted by the same user; 3) execution characteristics, including distributions of each job attribute, such as job queuing time, job actual runtime, job size, and memory usage, and the correlations between these job attributes. This work provides a realistic basis for scheduler design and comparison by studying the supercomputer’s workload with new approaches such as using Gaussian mixture model, and new viewpoints such as from the perspective of user community. To the best of our knowledge, it’s the first research to systematically investigate the workload characteristics of a petascale supercomputer that is dedicated to open scientific research.

[1]  Dan Tsafrir,et al.  Modeling User Runtime Estimates , 2005, JSSPP.

[2]  Emmanuel Medernach,et al.  Workload Analysis of a Cluster in a Grid Environment , 2005, JSSPP.

[3]  Francine Berman,et al.  A comprehensive model of the supercomputer workload , 2001 .

[4]  Tran Ngoc Minh,et al.  Modeling Parallel System Workloads with Temporal Locality , 2009, JSSPP.

[5]  Hui Li,et al.  Workload dynamics on clusters and grids , 2008, The Journal of Supercomputing.

[6]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[7]  Ramin Yahyapour,et al.  Modelling of Parameters in Supercomputer Workloads , 2004, ARCS Workshops.

[8]  Michael Muskulus,et al.  Analysis and modeling of job arrivals in a production grid , 2007, PERV.

[9]  Mary K. Vernon,et al.  Characteristics of a Large Shared Memory Production Workload , 2001, JSSPP.

[10]  A. Snavely,et al.  What ’ s working in HPC : Investigating HPC User Behavior and Productivity , 2006 .

[11]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[12]  David Groep Workload Characteristics of the DAS-2 Supercomputer , 2004 .

[13]  Emmanouel A. Varvarigos,et al.  Statistical Analysis and Modeling of Jobs in a Grid Environment , 2007, Journal of Grid Computing.

[14]  Hui Li,et al.  Workload Characteristics of a Multi-cluster Supercomputer , 2004, JSSPP.

[15]  Denis Trystram,et al.  A synthetic workload generator for cluster computing , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[16]  Dror G. Feitelson,et al.  Workload Modeling for Performance Evaluation , 2002, Performance.

[17]  Jens Mache,et al.  A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling , 1998, JSSPP.

[18]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[19]  Ramin Yahyapour,et al.  User group-based workload analysis and modelling , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..