Noise Measurement and Removal for Data Streaming Algorithms with Network Applications

Data streaming has multiple applications on the Internet including traffic measurement and intrusion detection. The bedrock underlying these applications is a set of data streaming algorithms that extract useful information from network packet stream, estimate the needed statistics such as the frequencies of TCP flows, and feed them to application software. Among such algorithms, counting sketches are most prevalent, which are very compact but do so at the cost of errors in their estimations. The dominant error-control method that has been widely accepted for more than a decade is to take the min error from multiple independent estimations. This method produces a positively-biased error and the error can grow large under stringent performance and resource conditions, but no existing work makes an intensive study of this error. This paper investigates the property of the error, which is also known as noise, and claims that it can be measured and removed so as to make the estimations unbiased. We introduce two new ideas, d-smallest noise and artificial data items for measuring the noise. Based on these two ideas, we propose four noise measurement methods. The mathematical analysis and experimental results based on real network traces show that by removing the measured noise, the error of estimations will be reduced to a much lower level than what the state of the art can do.

[1]  Ashwin Lall,et al.  Global iceberg detection over distributed data streams , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[2]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[3]  Jih-Kwon Peir,et al.  Randomized Error Removal for Online Spread Estimation in Data Streaming , 2021, Proc. VLDB Endow..

[4]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[5]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[6]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[7]  Nate Foster,et al.  NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[8]  Vladimir Braverman,et al.  One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon , 2016, SIGCOMM.

[9]  Athena Vakali,et al.  Social networking trends and dynamics detection via a cloud-based framework design , 2012, WWW.

[10]  Shigang Chen,et al.  Highly Compact Virtual Active Counters for Per-flow Traffic Measurement , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[11]  You Zhou,et al.  Generalized Sketch Families for Network Traffic Measurement , 2019, Proc. ACM Meas. Anal. Comput. Syst..

[12]  Roy Friedman,et al.  Nitrosketch: robust and general sketch-based monitoring in software switches , 2019, SIGCOMM.

[13]  Min Chen,et al.  Counter Tree: A Scalable Counter Architecture for Per-Flow Traffic Measurement , 2017, IEEE/ACM Transactions on Networking.

[14]  Minlan Yu,et al.  Software Defined Traffic Measurement with OpenSketch , 2013, NSDI.

[15]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[16]  Peng Liu,et al.  Elastic sketch: adaptive and fast network-wide measurements , 2018, SIGCOMM.

[17]  Yong Tang,et al.  Slowing down Internet worms , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[18]  Fan Deng New Estimation Algorithms for Streaming Data : Count-min Can Do More , 2022 .

[19]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[20]  Kihong Park,et al.  On the effectiveness of route-based packet filtering for distributed DoS attack prevention in power-law internets , 2001, SIGCOMM '01.

[21]  Graham Cormode,et al.  Sketch Algorithms for Estimating Point Queries in NLP , 2012, EMNLP.

[22]  Shigang Chen,et al.  Fast and compact per-flow traffic measurement through randomized counter sharing , 2011, 2011 Proceedings IEEE INFOCOM.