Workload sanitation for performance evaluation

The performance of computer systems depends, among other things, on the workload. Performance evaluations are therefore often done using logs of workloads on current productions systems, under the assumption that such real workloads are representative and reliable; likewise, workload modeling is typically based on real workloads. We show, however, that real workloads may also contain anomalies that make them non-representative and unreliable. This is a special case of multi-class workloads, where one class is the "real" workload which we wish to use in the evaluation, and the other class contaminates the log with "bogus" data. We provide several examples of this situation, including a previously unrecognized type of anomaly we call "workload flurries": surges of activity with a repetitive nature, caused by a single user, that dominate the workload for a relatively short period. Using a workload with such anomalies in effect emphasizes rare and unique events (e.g. occurring for a few days out of two years of logged data), and risks optimizing the design decision for the anomalous workload at the expense of the normal workload. Thus we claim that such anomalies should be removed from the workload before it is used in evaluations, and that ignoring them is actually an unjustifiable approach.

[1]  Steven Hotovy,et al.  Workload Evolution on the Cornell Theory Center IBM SP2 , 1996, JSSPP.

[2]  Mark Burgess,et al.  Measuring system normality , 2002, TOCS.

[3]  Averill M. Law,et al.  Simulation Modeling and Analysis , 1982 .

[4]  Mary K. Vernon,et al.  Characteristics of a Large Shared Memory Production Workload , 2001, JSSPP.

[5]  Giuseppe Serazzi,et al.  Workload characterization: a survey , 1993, Proc. IEEE.

[6]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[7]  Fang Wang,et al.  Modeling of Workload in MPPs , 1997, JSSPP.

[8]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[9]  Ashok K. Agrawala,et al.  An Approach to the Workload Characterization Problem , 1976, Computer.

[10]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.

[11]  Domenico Ferrari,et al.  Workload charaterization and Selection in Computer Performance Measurement , 1972, Computer.

[12]  Dan Tsafrir,et al.  Instability in parallel job scheduling simulation: the role of workload flurries , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[13]  Martin Arlitt,et al.  A workload characterization study of the 1998 World Cup Web site , 2000, IEEE Netw..

[14]  Bo Hong,et al.  Managing flash crowds on the Internet , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[15]  Francine Berman,et al.  A comprehensive model of the supercomputer workload , 2001 .

[16]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[17]  Dror G. Feitelson,et al.  Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860 , 1995, JSSPP.

[18]  Jim Gemmell,et al.  Using Multicast FEC to Solve the Midnight Madness Problem , 1997 .

[19]  Dror G. Feitelson,et al.  Metric and workload effects on computer systems evaluation , 2003, Computer.