An online framework for catching top spreaders and scanners

Flow level information is important for many applications in network measurement and analysis. In this work, we tackle the ''Top Spreaders'' and ''Top Scanners'' problems, where hosts that are spreading the largest numbers of flows, especially small flows, must be efficiently and accurately identified. The identification of these top users can be very helpful in network management, traffic engineering, application behavior analysis, and anomaly detection. We propose novel streaming algorithms and a ''Filter-Tracker-Digester'' framework to catch the top spreaders and scanners online. Our framework combines sampling and streaming algorithms, as well as deterministic and randomized algorithms, in such a way that they can effectively help each other to improve accuracy while reducing memory usage and processing time. To our knowledge, we are the first to tackle the ''Top Scanners'' problem in a streaming way. We address several challenges, namely: traffic scale, skewness, speed, memory usage, and result accuracy. The performance bounds of our algorithms are derived analytically, and are also evaluated by both real and synthetic traces, where we show our algorithm can achieve accuracy and speed of at least an order of magnitude higher than existing approaches.

[1]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[2]  Tatsuya Mori,et al.  Estimating Top N Hosts in Cardinality Using Small Memory Resources , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[3]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[4]  Carsten Lund,et al.  Properties and prediction of flow statistics from sampled packet streams , 2002, IMW '02.

[5]  Ashwin Lall,et al.  A data streaming algorithm for estimating entropies of od flows , 2007, IMC '07.

[6]  Jin Cao,et al.  A Simple and Efficient Estimation Method for Stream Expression Cardinalities , 2007, VLDB.

[7]  Donald F. Towsley,et al.  A resource-minimalist flow size histogram estimator , 2008, IMC '08.

[8]  Grenville J. Armitage,et al.  Optimising online FPS game server discovery through clustering servers by origin autonomous system , 2008, NOSSDAV.

[9]  Tatsuya Mori,et al.  Simple and Adaptive Identification of Superspreaders by Flow Sampling , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[10]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[11]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[12]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[13]  Andrea Montanari,et al.  Counter braids: a novel counter architecture for per-flow measurement , 2008, SIGMETRICS '08.

[14]  P. Flajolet,et al.  Loglog counting of large cardinalities , 2003 .

[15]  Abhishek Kumar,et al.  Data streaming algorithms for efficient and accurate estimation of flow size distribution , 2004, SIGMETRICS '04/Performance '04.

[16]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[17]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[18]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2005, TNET.

[19]  Abhishek Kumar,et al.  Joint data streaming and sampling techniques for detection of super sources and destinations , 2005, IMC '05.

[20]  Dawn Xiaodong Song,et al.  New Streaming Algorithms for Fast Detection of Superspreaders , 2005, NDSS.

[21]  Rajeev Rastogi,et al.  Streaming Algorithms for Robust, Real-Time Detection of DDoS Attacks , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[22]  S. Janson Stable distributions , 2011, 1112.0220.

[23]  George Varghese,et al.  Automated Worm Fingerprinting , 2004, OSDI.

[24]  S. Muthukrishnan,et al.  Estimating Rarity and Similarity over Data Stream Windows , 2002, ESA.

[25]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[26]  P. Flajolet,et al.  HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .

[27]  Abhishek Kumar,et al.  Data streaming algorithms for accurate and efficient measurement of traffic and flow matrices , 2005, SIGMETRICS '05.

[28]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[29]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[30]  Vyas Sekar,et al.  Data streaming algorithms for estimating entropy of network traffic , 2006, SIGMETRICS '06/Performance '06.

[31]  J. Skudutis,et al.  Investigation of the Intrusion Detection System "Snort" Performance , 2008 .

[32]  Fabrice Guillemin,et al.  Estimating Local Cardinalities in a Multidimensional Multiset , 2007, AIMS.

[33]  Jing Cao,et al.  Identifying High Cardinality Internet Hosts , 2009, IEEE INFOCOM 2009.

[34]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.