SA Sketch: A self‐adaption sketch framework for high‐speed network

Sketch is a compact data structure used to summarize data streams. It is widely used in the measurement of network traffic, and its accuracy is higher than traditional methods. Currently, there are some typical sketches: Count‐Min Sketch, CU Sketch, and Count Sketch. According to the characteristics of network traffic, we propose a new sketch framework called Self‐Adaption Sketch, which is combined Sketch with Bloom Filter. In the framework, the sketch is created dynamically and the memory space is adjusted timely according to the network traffic by using the concept carrying. Our experiment results showed that the space utilization and accuracy are significantly improved while the throughput of self‐adaption sketch is maintained at a relatively good level.

[1]  Sasu Tarkoma,et al.  Theory and Practice of Bloom Filters for Distributed Systems , 2012, IEEE Communications Surveys & Tutorials.

[2]  Graham Cormode,et al.  Sketch Techniques for Approximate Query Processing , 2010 .

[3]  Tong Yang,et al.  SketchML: Accelerating Distributed Machine Learning with Data Sketches , 2018, SIGMOD Conference.

[4]  Graham Cormode,et al.  Approximating Data with the Count-Min Sketch , 2012, IEEE Software.

[5]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[6]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[7]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[8]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[9]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[10]  Frederic Raspall Efficient packet sampling for accurate traffic measurements , 2012, Comput. Networks.

[11]  Gaogang Xie,et al.  A Shifting Bloom Filter Framework for Set Queries , 2015, Proc. VLDB Endow..

[12]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[13]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[14]  Yingshu Li,et al.  Approximate data aggregation in sensor equipped IoT networks , 2020, Tsinghua Science and Technology.

[15]  Miles Osborne,et al.  Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap , 2007, EMNLP.

[16]  Jennifer C. Hou,et al.  On sampling self-similar Internet traffic , 2006, Comput. Networks.

[17]  Josep-Lluís Larriba-Pey,et al.  Dynamic count filters , 2006, SGMD.

[18]  Graham Cormode,et al.  Approximating Data with the Count-Min Data Structure , 2011 .

[19]  Dagang Li,et al.  Multi-copy Cuckoo Hashing , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[20]  Graham Cormode,et al.  What's new: finding significant differences in network data streams , 2004, IEEE/ACM Transactions on Networking.

[21]  Steve Uhlig,et al.  HeavyKeeper: An Accurate Algorithm for Finding Top- $k$ Elephant Flows , 2019, IEEE/ACM Transactions on Networking.

[22]  George Varghese,et al.  Building a better NetFlow , 2004, SIGCOMM.

[23]  Changqing An,et al.  MD-AVB: A multi-manifold based available bandwidth prediction algorithm , 2020, Tsinghua Science and Technology.

[24]  Neoklis Polyzotis,et al.  Approximate XML query answers , 2004, SIGMOD '04.

[25]  Bo Zhao,et al.  ZenLDA: Large-scale topic model training on distributed data-parallel platform , 2018, Big Data Min. Anal..

[26]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[27]  Roy Friedman,et al.  Heavy hitters in streams and sliding windows , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[28]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[29]  Linfeng Liu,et al.  CBFSketch: A Scalable Sketch Framework for High Speed Network , 2019, 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD).

[30]  R. Vershynin,et al.  One sketch for all: fast algorithms for compressed sensing , 2007, STOC '07.

[31]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[32]  Li Xu,et al.  Online Internet traffic monitoring system using spark streaming , 2018, Big Data Min. Anal..