Learn more, sample less: control of volume and variance in network measurement

This paper deals with sampling objects from a large stream. Each object possesses a size, and the aim is to be able to estimate the total size of an arbitrary subset of objects whose composition is not known at the time of sampling. This problem is motivated from network measurements in which the objects are flow records exported by routers and the sizes are the number of packet or bytes reported in the record. Subsets of interest could be flows from a certain customer or flows from a worm attack. This paper introduces threshold sampling as a sampling scheme that optimally controls the expected volume of samples and the variance of estimators over any classification of flows. It provides algorithms for dynamic control of sample volumes and evaluates them on flow data gathered from a commercial Internet Protocol (IP) network. The algorithms are simple to implement and robust to variation in network conditions. The work reported here has been applied in the measurement infrastructure of the commercial IP network. To not have employed sampling would have entailed an order of magnitude greater capital expenditure to accommodate the measurement traffic and its processing.

[1]  A. Winsor Sampling techniques. , 2000, Nursing times.

[2]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[3]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2005, TNET.

[4]  Richard G. Baraniuk,et al.  A Multifractal Wavelet Model with Application to Network Traffic , 1999, IEEE Trans. Inf. Theory.

[5]  kc claffy,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM 1993.

[6]  Anja Feldmann,et al.  Data networks as cascades: investigating the multifractal nature of Internet WAN traffic , 1998, SIGCOMM '98.

[7]  Anja Feldmann,et al.  Deriving traffic demands for operational IP networks: methodology and experience , 2000, SIGCOMM.

[8]  Bogdan M. Wilamowski,et al.  The Transmission Control Protocol , 2005, The Industrial Information Technology Handbook.

[9]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[10]  FeldmannA.,et al.  Data networks as cascades , 1998 .

[11]  Vern Paxson,et al.  Framework for IP Performance Metrics , 1998, RFC.

[12]  Pierre L'Ecuyer,et al.  Efficient and portable combined random number generators , 1988, CACM.

[13]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[14]  Stefan Savage,et al.  Inferring Internet denial-of-service activity , 2001, TOCS.

[15]  Kimberly C. Claffy,et al.  OC3MON: Flexible, Affordable, High Performance Staistics Collection , 1996, LISA.

[16]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[17]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[18]  Anja Feldmann,et al.  NetScope: traffic engineering for IP networks , 2000, IEEE Netw..

[19]  WillingerWalter,et al.  On the self-similar nature of Ethernet traffic , 1993 .

[20]  Bruce E. Hajek,et al.  Extremal Splittings of Point Processes , 1985, Math. Oper. Res..

[21]  Nick G. Duffield,et al.  Trajectory sampling for direct traffic observation , 2001, TNET.

[22]  Anja Feldmann,et al.  Efficient policies for carrying Web traffic over flow-switched networks , 1998, TNET.

[23]  Carsten Lund,et al.  Predicting resource usage and estimation accuracy in an IP flow measurement collection infrastructure , 2003, IMC '03.

[24]  Mikkel Thorup,et al.  Internet traffic engineering by optimizing OSPF weights , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[25]  George C. Polyzos,et al.  A Parameterizable Methodology for Internet Traffic Flow Profiling , 1995, IEEE J. Sel. Areas Commun..

[26]  Carsten Lund,et al.  Flow sampling under hard resource constraints , 2004, SIGMETRICS '04/Performance '04.

[27]  George C. Polyzos,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM '93.

[28]  Anja Feldmann,et al.  Measurement and analysis of IP network usage and behavior , 2000, IEEE Commun. Mag..

[29]  Nick Duffield,et al.  Sampling for Passive Internet Measurement: A Review , 2004 .

[30]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[31]  Vern Paxson,et al.  Automated packet trace analysis of TCP implementations , 1997, SIGCOMM '97.

[32]  Carsten Lund,et al.  Charging from sampled network usage , 2001, IMW '01.