Beyond Poisson: Modeling Inter-Arrival Time of Requests in a Datacenter

How frequently are computer jobs submitted to an industrial-scale datacenter? We investigate the trace that contains job requests and execution collected in one of large-scale industrial datacenters, which spans near half of a Terabyte. In this paper, we discover and explain two surprising patterns with respect to the inter-arrival time (IAT) of job requests: (a) multiple periodicities and (b) multi-level bundling effects. Specifically, we propose a novel generative process, Hierarchical Bundling Model (HiBM), for modeling the data. HiBM is able to mimic multiple components in the distribution of IAT, and to simulate job requests with the same statistical properties as in the real data. We also provide a systematic approach to estimate the parameters of HiBM.

[1]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[2]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[3]  Padhraic Smyth,et al.  Adaptive event detection with time-varying poisson processes , 2006, KDD '06.

[4]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Swapna S. Gokhale,et al.  Log-logistic software reliability growth model , 1998, Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231).

[6]  Ming Zhang,et al.  Understanding data center traffic characteristics , 2010, CCRV.

[7]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[8]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[9]  Wolfgang Fischer,et al.  The Markov-Modulated Poisson Process (MMPP) Cookbook , 1993, Perform. Evaluation.

[10]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[11]  Christos Faloutsos,et al.  Surprising Patterns for the Call Duration Distribution of Mobile Phone Users , 2010, ECML/PKDD.

[12]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[13]  S. Bennett,et al.  Log‐Logistic Regression Models for Survival Data , 1983 .

[14]  J. Lawless Statistical Models and Methods for Lifetime Data , 2002 .

[15]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[16]  Martin Saveski Web Services for Stream Mining : A Stream-Based Active Learning Use Case , 2011 .