Finding Heavy Hitters by Packet Count Flow Sampling

In many applications, ranging from network congestion monitoring to data mining, it is often desirable to identify from a large data set whose frequency is above a given threshold. This can help us find out the heaviest users, most popular web sites and so on.Our work focus on packet count heavy hitters finding problem , especially suite for Some attacks such as SYN flood and port scans. These kind of anomaly will not occupy much bandwidth, but still can affect the Internet seriously. A major difficulty with detecting heavy hitters on a high-speed monitoring point is that the traffic volume can contain millions of flows. So we present a threshold sampling technique. It can select large ones prior to small ones.Meanwhile, it can control the resources consumed by adjusting the threshold. The main procedures of this method is the source IP address base packet count aggregating and sorting. The experimental results show that heavy hitters from the sample approximate that from the original dataset, proofing that our method are effective.

[1]  David Moore,et al.  A robust system for accurate real-time summaries of internet traffic , 2005, SIGMETRICS '05.

[2]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[3]  Dawn Xiaodong Song,et al.  New Streaming Algorithms for Fast Detection of Superspreaders , 2005, NDSS.

[4]  Yossi Matias,et al.  DIMACS Series in Discrete Mathematicsand Theoretical Computer Science Synopsis Data Structures for Massive Data , 2007 .

[5]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[6]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[7]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[8]  Tatsuya Mori,et al.  Simple and Accurate Identification of High-Rate Flows by Packet Sampling , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[9]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.

[10]  S. Muthukrishnan,et al.  How to Summarize the Universe: Dynamic Maintenance of Quantiles , 2002, VLDB.

[11]  Zhi-Li Zhang,et al.  Adaptive packet sampling for flow volume measurement , 2002, CCRV.

[12]  Carsten Lund,et al.  Learn more, sample less: control of volume and variance in network measurement , 2005, IEEE Transactions on Information Theory.

[13]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[14]  Nick Duffield,et al.  Sampling for Passive Internet Measurement: A Review , 2004 .

[15]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[16]  Carsten Lund,et al.  Flow sampling under hard resource constraints , 2004, SIGMETRICS '04/Performance '04.

[17]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[18]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[19]  Warren Bower New directions , 1937 .

[20]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[21]  Xenofontas A. Dimitropoulos,et al.  Probabilistic lossy counting: an efficient algorithm for finding heavy hitters , 2008, CCRV.