Flow sampling under hard resource constraints

Many network management applications use as their data traffic volumes differentiated by attributes such as IP address or port number. IP flow records are commonly collected for this purpose: these enable determination of fine-grained usage of network resources. However, the increasingly large volumes of flow statistics incur concomitant costs in the resources of the measurement infrastructure. This motivates sampling of flow records.This paper addresses sampling strategy for flow records. Recent work has shown that non-uniform sampling is necessary in order to control estimation variance arising from the observed heavy-tailed distribution of flow lengths. However, while this approach controls estimator variance, it does not place hard limits on the number of flows sampled. Such limits are often required during arbitrary downstream sampling, resampling and aggregation operations employed in analysis of the data.This paper proposes a correlated sampling strategy that is able to select an arbitrarily small number of the "best" representatives of a set of flows. We show that usage estimates arising from such selection are unbiased, and show how to estimate their variance, both offline for modeling purposes, and online during the sampling itself. The selection algorithm can be implemented in a queue-like data structure in which memory usage is uniformly bounded during measurement. Finally, we compare the complexity and performance of our scheme with other potential approaches.

[1]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[2]  Carsten Lund,et al.  Charging from sampled network usage , 2001, IMW '01.

[3]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[4]  Nicolas Hohn,et al.  Inverting sampled traffic , 2003, IEEE/ACM Transactions on Networking.

[5]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[6]  Alan D. George,et al.  Adaptive Sampling for Network Management , 2001, Journal of Network and Systems Management.

[7]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[8]  Philippe Flajolet,et al.  Adaptive Sampling , 1997 .

[9]  Matthias Grossglauser,et al.  Trajectory sampling for direct traffic observation , 2000, SIGCOMM 2000.

[10]  Murali S. Kodialam,et al.  Runs based traffic estimator (RATE): a simple, memory efficient scheme for per-flow rate estimation , 2004, IEEE INFOCOM 2004.

[11]  Rajeev Motwani,et al.  On random sampling over joins , 1999, SIGMOD '99.

[12]  Carsten Lund,et al.  Learn more, sample less: control of volume and variance in network measurement , 2005, IEEE Transactions on Information Theory.

[13]  Zhi-Li Zhang,et al.  Adaptive random sampling for load change detection , 2002, SIGMETRICS '02.

[14]  Carsten Lund,et al.  Predicting resource usage and estimation accuracy in an IP flow measurement collection infrastructure , 2003, IMC '03.

[15]  Anja Feldmann,et al.  Efficient policies for carrying Web traffic over flow-switched networks , 1998, TNET.

[16]  Anja Feldmann,et al.  Deriving traffic demands for operational IP networks: methodology and experience , 2000, SIGCOMM.

[17]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[18]  Narayanaswamy Balakrishnan,et al.  Relations, Bounds and Approximations for Order Statistics , 1989 .

[19]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2005, TNET.