An In-Depth, Analytical Study of Sampling Techniques for Self-Similar Internet Traffic

Techniques for sampling Internet traffic are very important to understand the traffic characteristics of the Internet (Feldman et al., 2000). In spite of alt the research efforts on packet sampling, none has taken into account of self-similarity of Internet traffic in devising sampling strategies. In this paper, we perform an in-depth, analytical study of three sampling techniques for self-similar Internet traffic, namely static systematic sampling, stratified random sampling and simple random sampling. We show that while all three sampling techniques can accurately capture the Hurst parameter (second order statistics) of Internet traffic, they fail to capture the mean (first order statistics) faithfully. We also show that static systematic sampling renders the smallest variation of sampling results in different instances of sampling (i.e., it gives sampling results of high fidelity). Based on an important observation, we then devise a new variation of static systematic sampling, called biased systematic sampling (BSS), that gives much more accurate estimates of the mean, while keeping the sampling overhead low. Both the analysis on the three sampling techniques and the evaluation of BSS are performed on synthetic and real Internet traffic traces. Our performance study shows that BSS gives a performance improvement of 40% and 20% (in terms of efficiency) as compared to static systematic and simple random sampling

[1]  Walter Willinger,et al.  Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level , 1997, TNET.

[2]  Carsten Lund,et al.  Properties and prediction of flow statistics from sampled packet streams , 2002, IMW '02.

[3]  Patrice Abry,et al.  Real-Time Estimation of the Parameters of Long-Range Dependence (Extended Version) , 2000 .

[4]  Ratul Mahajan,et al.  Controlling High Bandwidth Aggregates in the Network (Extended Version) , 2001 .

[5]  Ratul Mahajan,et al.  Controlling high bandwidth aggregates in the network , 2002, CCRV.

[6]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[7]  Walter Willinger,et al.  Self-similarity and heavy tails: structural modeling of network traffic , 1998 .

[8]  Walter Willinger,et al.  Self-Similar Network Traffic and Performance Evaluation , 2000 .

[9]  Anja Feldmann,et al.  Deriving traffic demands for operational IP networks: methodology and experience , 2000, SIGCOMM.

[10]  Stefano Giordano,et al.  A measurement based QoS evaluation through traffic sampling , 1998 .

[11]  Carsten Lund,et al.  Charging from sampled network usage , 2001, IMW '01.

[12]  George C. Polyzos,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM '93.

[13]  Nicolas Hohn,et al.  Inverting sampled traffic , 2003, IEEE/ACM Transactions on Networking.

[14]  Zhi-Li Zhang,et al.  Adaptive random sampling for load change detection , 2002, SIGMETRICS '02.

[15]  Walter Willinger,et al.  Experimental queueing analysis with long-range dependent packet traffic , 1996, TNET.

[16]  Lester Lipsky,et al.  Long-lasting transient conditions in simulations with heavy-tailed workloads , 1997, WSC '97.

[17]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[18]  Jin Cao,et al.  The effect of statistical multiplexing on Internet packet traffic , 2001 .

[19]  Nick G. Duffield,et al.  Trajectory sampling for direct traffic observation , 2001, TNET.

[20]  Anja Feldmann,et al.  Deriving traffic demands for operational IP networks: methodology and experience , 2001, TNET.

[21]  A. Winsor Sampling techniques. , 2000, Nursing times.

[22]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2005, TNET.