An Efficient K-Persistent Spread Estimator for Traffic Measurement in High-Speed Networks

Traffic measurement in high-speed networks has many important functions in improving network performance, assisting resource allocation, and detecting anomalies. In this paper, we study a generalized problem called <inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula>-persistent spread estimation, which measures the volume of persist traffic elements in each flow that appear during at least <inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> out of <inline-formula> <tex-math notation="LaTeX">${t}$ </tex-math></inline-formula> measurement periods, where <inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">${t}$ </tex-math></inline-formula> are two positive integers that can be arbitrarily set in user queries, with <inline-formula> <tex-math notation="LaTeX">${k} \le {t}$ </tex-math></inline-formula>. Solutions to this problem have interesting applications in network attack detection, popular content identification, user access profiling, etc. There is very limited prior art for this problem, only addressing the special case of <inline-formula> <tex-math notation="LaTeX">${k} = {t}$ </tex-math></inline-formula> under a flawed assumption. Removing this assumption, we propose an efficient and accurate estimator for generalized <inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula>-persistent traffic measurement, with <inline-formula> <tex-math notation="LaTeX">${k} \le {t}$ </tex-math></inline-formula>. Our method relies on bitwise SUM, instead of bitwise AND in the prior art, to combine the information collected from different periods. This change has fundamental impact on the probabilistic analysis that derives the estimator, particular over space-saving virtual bitmaps. Based on real network traces, we demonstrate experimentally the effectiveness of our new method in estimating the <inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula>-persistent spreads of all network flows. Our estimator performs much better than the prior art on its case of <inline-formula> <tex-math notation="LaTeX">${k} = {t}$ </tex-math></inline-formula>. We also incorporate a sampling module to the estimator for improved flexibility, and give a use study on how to detect and find DDoS attackers using the proposed estimator.

[1]  Shigang Chen,et al.  Estimating the Persistent Spreads in High-Speed Networks , 2014, 2014 IEEE 22nd International Conference on Network Protocols.

[2]  Andrea Montanari,et al.  Counter braids: a novel counter architecture for per-flow measurement , 2008, SIGMETRICS '08.

[3]  Shigang Chen,et al.  Highly Compact Virtual Active Counters for Per-flow Traffic Measurement , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[4]  Xenofontas A. Dimitropoulos,et al.  Probabilistic lossy counting: an efficient algorithm for finding heavy hitters , 2008, CCRV.

[5]  Vladimir Braverman,et al.  One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon , 2016, SIGCOMM.

[6]  Carsten Lund,et al.  Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications , 2004, IMC '04.

[7]  Jih-Kwon Peir,et al.  Fit a Spread Estimator in Small Memory , 2009, IEEE INFOCOM 2009.

[8]  Dawn Xiaodong Song,et al.  New Streaming Algorithms for Fast Detection of Superspreaders , 2005, NDSS.

[9]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[10]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[11]  Alexander Hall,et al.  HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm , 2013, EDBT '13.

[12]  Shigang Chen,et al.  Per-flow counting for big network data stream over sliding windows , 2017, 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS).

[13]  Edith Cohen,et al.  Mitigating DNS random subdomain DDoS attacks by distinct heavy hitters sketches , 2017, HotWeb.

[14]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[15]  Min Chen,et al.  Counter Tree: A Scalable Counter Architecture for Per-Flow Traffic Measurement , 2017, IEEE/ACM Transactions on Networking.

[16]  Abhishek Kumar,et al.  Detection of Super Sources and Destinations in High-Speed Networks: Algorithms, Analysis and Evaluation , 2006, IEEE Journal on Selected Areas in Communications.

[17]  Minlan Yu,et al.  Software Defined Traffic Measurement with OpenSketch , 2013, NSDI.

[18]  Shaojie Tang,et al.  You Can Drop but You Can't Hide: $K$-persistent Spread Estimation in High-speed Networks , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[19]  P. Flajolet,et al.  HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .

[20]  Haipeng Dai,et al.  Finding Persistent Items in Distributed Datasets , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[21]  Min Chen,et al.  Persistent Spread Measurement for Big Network Data Based on Register Intersection , 2017, SIGMETRICS.

[22]  Min Chen,et al.  Highly Compact Virtual Counters for Per-Flow Traffic Measurement through Register Sharing , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[23]  George Varghese,et al.  Efficient implementation of a statistics counter architecture , 2003, SIGMETRICS '03.

[24]  Jih-Kwon Peir,et al.  Fit a Compact Spread Estimator in Small High-Speed Memory , 2011, IEEE/ACM Transactions on Networking.

[25]  Haipeng Dai,et al.  Finding Persistent Items in Data Streams , 2016, Proc. VLDB Endow..

[26]  Björn Scheuermann,et al.  High-Speed Per-Flow Traffic Measurement with Probabilistic Multiplicity Counting , 2010, 2010 Proceedings IEEE INFOCOM.

[27]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[28]  Qi Zhao,et al.  Design of a novel statistics counter architecture with optimal space and time efficiency , 2006, SIGMETRICS '06/Performance '06.

[29]  Min Sik Kim,et al.  Real-Time Detection of Stealthy DDoS Attacks Using Time-Series Decomposition , 2010, 2010 IEEE International Conference on Communications.

[30]  Jugal K. Kalita,et al.  Rank Correlation for Low-Rate DDoS Attack Detection: An Empirical Evaluation , 2016, Int. J. Netw. Secur..

[31]  Roy Friedman,et al.  Pay for a Sliding Bloom Filter and Get Counting, Distinct Elements, and Entropy for Free , 2017, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[32]  He Huang,et al.  Persistent Traffic Measurement Through Vehicle-to-Infrastructure Communications , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[33]  P. Flajolet,et al.  Loglog counting of large cardinalities , 2003 .

[34]  Shigang Chen,et al.  Better with fewer bits: Improving the performance of cardinality estimation of large data streams , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[35]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[36]  Shigang Chen,et al.  Fast and compact per-flow traffic measurement through randomized counter sharing , 2011, 2011 Proceedings IEEE INFOCOM.

[37]  David P. Woodruff,et al.  Nearly Optimal Distinct Elements and Heavy Hitters on Sliding Windows , 2018, APPROX-RANDOM.