On sampling self-similar internet traffic

Techniques for sampling Internet traffic are very important to understand the traffic characteristics of the Internet [A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, F. True, Deriving traffic demands for operational ip networks: methodology and experience, in: Proc. ACM SIGCOMM'00, August 2000, pp. 257-270; N.G. Duffield, M. Grossglauser, Trajectory sampling for direct traffic observation, in: Proc. ACM SIGCOMM'00, August 2000, pp. 271-282]. In spite of all the research efforts on packet sampling, none has taken into account of self-similarity of Internet traffic in devising sampling strategies. In this paper, we perform an in-depth, analytical study of three sampling techniques for self-similar Internet traffic, namely static systematic sampling, stratified random sampling and simple random sampling. We show that while all three sampling techniques can accurately capture the Hurst parameter (second order statistics) of Internet traffic, they fail to capture the mean (first order statistics) faithfully. We also show that static systematic sampling renders the smallest variation of sampling results in different instances of sampling (i.e., it gives sampling results of high fidelity). Based on an important observation, we then devise a new variation of static systematic sampling, called biased systematic sampling (BSS), that gives much more accurate estimates of the mean, while keeping the sampling overhead low. Both the analysis on the three sampling techniques and the evaluation of BSS are performed on synthetic and real Internet traffic traces. Our performance study shows that BSS gives a performance improvement of 40% and 20% (in terms of efficiency) as compared to static systematic and simple random sampling.

[1]  Matthew Roughan,et al.  Real-time estimation of the parameters of long-range dependence , 2000, TNET.

[2]  Walter Willinger,et al.  Self-similarity and heavy tails: structural modeling of network traffic , 1998 .

[3]  Ratul Mahajan,et al.  Controlling high bandwidth aggregates in the network , 2002, CCRV.

[4]  kc claffy,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM 1993.

[5]  Matthias Grossglauser,et al.  Trajectory sampling for direct traffic observation , 2000, SIGCOMM 2000.

[6]  Cristian Estan,et al.  New directions in traffic measurement and accounting , 2001, IMW '01.

[7]  Kihong Park,et al.  Future Directions and Open Problems in Performance Evaluation and Control of Self‐Similar Network Traffic , 2002 .

[8]  R. Adler,et al.  A practical guide to heavy tails: statistical techniques and applications , 1998 .

[9]  kc claffy,et al.  Application of sampling methodologies to wide-area network traffic characterization , 1993, Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication.

[10]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[11]  Stefano Giordano,et al.  A measurement based QoS evaluation through traffic sampling , 1998 .

[12]  Carsten Lund,et al.  Charging from sampled network usage , 2001, IMW '01.

[13]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2003, SIGCOMM '03.

[14]  Carsten Lund,et al.  Properties and prediction of flow statistics from sampled packet streams , 2002, IMW '02.

[15]  Walter Willinger,et al.  Experimental queueing analysis with long-range dependent packet traffic , 1996, TNET.

[16]  Zhi-Li Zhang,et al.  Adaptive random sampling for load change detection , 2002, SIGMETRICS '02.

[17]  Walter Willinger,et al.  Self-Similar Network Traffic and Performance Evaluation , 2000 .

[18]  Anja Feldmann,et al.  Deriving traffic demands for operational IP networks: methodology and experience , 2000, SIGCOMM.

[19]  Lester Lipsky,et al.  Long-lasting transient conditions in simulations with heavy-tailed workloads , 1997, WSC '97.