Stochastic Queuing Simulation for Data Center Workloads

Data center systems and workloads are increasing in importance, yet there are few methods for evaluating potential changes to these systems. We introduce a new methodology for exascale evaluation, called Statistical Queuing Simulation (SQS). At its heart, SQS is a parallel, large-scale stochastic discrete time simulation of generalized queueing models that are driven by empirically-observed arrival and service distributions. SQS provides numerous practical advantages over alternative large-scale simulation techniques (e.g., trace-driven simulation), including statistical rigor and reduced turnaround time. We detail our methodology, workload suite, and practical concerns associated with them. To demonstrate our technique, we carry out a casestudy of data center power capping for 1000 servers. Finally, we discuss open research challenges for making SQS more robust.

[1]  Richard W. Conway,et al.  Some Tactical Problems in Digital Simulation , 1963 .

[2]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[3]  Krzysztof Pawlikowski,et al.  Steady-state simulation of queueing processes: survey of problems and solutions , 1990, CSUR.

[4]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[5]  Mor Harchol-Balter,et al.  Exploiting process lifetime distributions for dynamic load balancing , 1997, TOCS.

[6]  W.D. Kelton,et al.  Simulation-based estimation of quantiles , 1999, WSC'99. 1999 Winter Simulation Conference Proceedings. 'Simulation - A Bridge to the Future' (Cat. No.99CH37038).

[7]  N. Bingham,et al.  Nonparametric inference from M/G/l busy periods , 1999 .

[8]  W. David Kelton,et al.  Quantile and histogram estimation , 2001, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304).

[9]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[10]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[11]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[12]  David E. Irwin,et al.  Ensemble-level Power Management for Dense Blade Servers , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[13]  Thomas F. Wenisch,et al.  Simulation sampling with live-points , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[14]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[15]  Xiaorui Wang,et al.  Power capping: a prelude to power shifting , 2008, Cluster Computing.

[16]  Christoforos E. Kozyrakis,et al.  A Comparison of High-Level Full-System Power Models , 2008, HotPower.

[17]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[18]  Mor Harchol-Balter,et al.  M/G/k with Exponential Setup , 2009 .

[19]  Thomas F. Wenisch,et al.  Power routing: dynamic power provisioning in the data center , 2010, ASPLOS XV.

[20]  Mor Harchol-Balter,et al.  On the inapproximability of M/G/K: why two moments of job size distribution are not enough , 2010, Queueing Syst. Theory Appl..