Inverting sampled traffic

Routers have the ability to output statistics about packets and flows of packets that traverse them. Since, however, the generation of detailed traffic statistics does not scale well with link speed, increasingly routers and measurement boxes implement sampling strategies at the packet level. In this paper, we study both theoretically and practically what information about the original traffic can be inferred when sampling, or "thinning", is performed at the packet level. While basic packet level characteristics such as first order statistics can be fairly directly recovered, other aspects require more attention. We focus mainly on the spectral density, a second-order statistic, and the distribution of the number of packets per flow, showing how both can be exactly recovered, in theory. We then show in detail why in practice this cannot be done using the traditional packet based sampling, even for high sampling rate. We introduce an alternative flow-based thinning, where practical inversion is possible even at arbitrarily low sampling rate. We also investigate the theory and practice of fitting the parameters of a Poisson cluster process, modeling the full packet traffic, from sampled data.

[1]  Matthew Roughan,et al.  Computing queue-length distributions for power-law queues , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[2]  Abhishek Kumar,et al.  Space-code bloom filter for efficient traffic flow measurement , 2003, IMC '03.

[3]  J. M. Pullen,et al.  Countering denial-of-service attacks using congestion triggered packet sampling and filtering , 2001, Proceedings Tenth International Conference on Computer Communications and Networks (Cat. No.01EX495).

[4]  George C. Polyzos,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM '93.

[5]  John N. Daigle,et al.  Queue length distributions from probability generating functions via discrete fourier transforms , 1989 .

[6]  Patrice Abry,et al.  Does fractal scaling at the IP level depend on TCP flow arrival processes? , 2002, IMW '02.

[7]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[8]  Anja Feldmann,et al.  Performance of Web proxy caching in heterogeneous bandwidth environments , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[9]  Henry C. Thacher,et al.  Applied and Computational Complex Analysis. , 1988 .

[10]  Mark A. McComb A Practical Guide to Heavy Tails , 2000, Technometrics.

[11]  A. Winsor Sampling techniques. , 2000, Nursing times.

[12]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2005, TNET.

[13]  Claudine Chaffy,et al.  The analytic continuation process: from computer algebra to numerical analysis , 1995, ISSAC '95.

[14]  D. A. Drabold,et al.  Maximum-entropy approach to series extrapolation and analytic continuation , 1991 .

[15]  Laurent Massoulié,et al.  Power spectra of random spike fields and related processes , 2005, Advances in Applied Probability.

[16]  Patrice Abry,et al.  Wavelets for the Analysis, Estimation, and Synthesis of Scaling Data , 2002 .

[17]  Guang Cheng,et al.  Traffic behavior analysis with Poisson sampling on high-speed network , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[18]  Carsten Lund,et al.  Properties and prediction of flow statistics from sampled packet streams , 2002, IMW '02.

[19]  Nick G. Duffield,et al.  Trajectory sampling for direct traffic observation , 2001, TNET.

[20]  Carsten Lund,et al.  Learn more, sample less: control of volume and variance in network measurement , 2005, IEEE Transactions on Information Theory.

[21]  Keith Miller,et al.  Stabilized Numerical Analytic Prolongation with Poles , 1970 .

[22]  Hans-Werner Braun,et al.  Internet Flow Characterization: Adaptive Timeout Strategy and Statistical Modeling , 2001 .

[23]  Anja Feldmann,et al.  Efficient policies for carrying Web traffic over flow-switched networks , 1998, TNET.

[24]  Patrice Abry,et al.  The impact of the flow arrival process in Internet traffic , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  Nick McKeown,et al.  Monitoring very high speed links , 2001, IMW '01.

[26]  M. Meerschaert Regular Variation in R k , 1988 .

[27]  Patrice Abry,et al.  Cluster processes: a natural language for network traffic , 2003, IEEE Trans. Signal Process..

[28]  J. Ritcey,et al.  Pade approximations of probability density functions , 1994 .

[29]  George C. Polyzos,et al.  A Parameterizable Methodology for Internet Traffic Flow Profiling , 1995, IEEE J. Sel. Areas Commun..

[30]  Daryl J. Daley,et al.  An Introduction to the Theory of Point Processes , 2013 .

[31]  Nicolas Hohn,et al.  Inverting sampled traffic , 2003, IMC '03.

[32]  Ward Whitt,et al.  The Fourier-series method for inverting transforms of probability distributions , 1992, Queueing Syst. Theory Appl..

[33]  Kenneth J. Christensen,et al.  Adaptive sampling methods to determine network traffic statistics including the Hurst parameter , 1998, Proceedings 23rd Annual Conference on Local Computer Networks. LCN'98 (Cat. No.98TB100260).