Distributed Probabilistic Network Traffic Measurements

Measuring the per-flow traffic in large networks is very challenging due to the high performance requirements on the one hand, and due to the necessity to merge locally recorded data from multiple routers in order to obtain network-wide statistics on the other hand. The latter is nontrivial because traffic that traversed more than one measurement point must only be counted once, which requires duplicate-insensitive distributed counting mechanisms. Sampling-based traffic accounting as implemented in today’s routers results in large approximation errors, and does not allow for merging information from multiple points in the network into network-wide total traffic statistics. Here, we present Distributed Probabilistic Counting (DPC), an algorithm to obtain duplicate-insensitive distributed per-flow traffic statistics based on a probabilistic counting technique. DPC is structurally simple, very fast, and highly parallelizable, and therefore allows for efficient implementations in software and hardware. At the same time it provides very accurate traffic statistics, as we demonstrate based on both artificial and real-world traffic data.

[1]  Nicolas Hohn,et al.  Inverting sampled traffic , 2003, IMC '03.

[2]  Yu Cheng,et al.  Accurate and Efficient Traffic Monitoring Using Adaptive Non-Linear Sampling Method , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[3]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[4]  Bruce A. Mah,et al.  An empirical model of HTTP network traffic , 1997, Proceedings of INFOCOM '97.

[5]  Devavrat Shah,et al.  Analysis of a statistics counter architecture , 2001, HOT 9 Interconnects. Symposium on High Performance Interconnects.

[6]  Andrea Montanari,et al.  Counter braids: a novel counter architecture for per-flow measurement , 2008, SIGMETRICS '08.

[7]  Xenofontas A. Dimitropoulos,et al.  Probabilistic lossy counting: an efficient algorithm for finding heavy hitters , 2008, CCRV.

[8]  Abhishek Kumar,et al.  Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[9]  Lin Chuang,et al.  Handling High Speed Traffic Measurement Using Network Processors , 2006, 2006 International Conference on Communication Technology.

[10]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[11]  Baek-Young Choi,et al.  Observations on Cisco sampled NetFlow , 2005, PERV.

[12]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[13]  Carsten Lund,et al.  Charging from sampled network usage , 2001, IMW '01.

[14]  Gennady Samorodnitsky,et al.  Variable heavy tails in Internet traffic , 2004, Perform. Evaluation.

[15]  Tanja Zseby,et al.  Empirical evaluation of hash functions for multipoint measurements , 2008, CCRV.

[16]  Björn Scheuermann,et al.  High-Speed Per-Flow Traffic Measurement with Probabilistic Multiplicity Counting , 2010, 2010 Proceedings IEEE INFOCOM.

[17]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[18]  Nick G. Duffield,et al.  Sampling and Filtering Techniques for IP Packet Selection , 2009, RFC.

[19]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[20]  George Varghese,et al.  Efficient implementation of a statistics counter architecture , 2003, SIGMETRICS '03.

[21]  A. Kumar,et al.  Space-code bloom filter for efficient per-flow traffic measurement , 2004, IEEE INFOCOM 2004.

[22]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[23]  Tanja Zseby,et al.  Evaluation of Header Field Entropy for Hash-Based Packet Selection , 2008, PAM.