Diamond Sketch: Accurate Per-flow Measurement for Big Streaming Data

Per-flow measurement is a critical issue in computer networks, and one of its key tasks is to count the number of packets in each flow (for big streaming data). The literature has demonstrated that sketch is the most memory-efficient data structure for the counting task, and is widely used in distributed systems. Existing sketches often use many counters that are of the same size to record the number of packets in a flow, thus the counters are forced to be large enough to accommodate the size of the largest flow. Unfortunately, as most flows are small (i.e., mice flows) and only a very few flows are large (i.e., elephant flows), many counters represent very small values, which is a waste of memory. Sketches are often stored in fast but expensive memory (e.g., SRAM), thus it is critical to achieve high memory efficiency. To address this issue, we propose a novel sketch, namely the Diamond sketch. The Diamond sketch is composed of atom sketches, and each atom sketch uses small counters. The key idea of Diamond is to dynamically assign an appropriate number of atom sketches to each flow on demand, thus optimizing memory efficiency. Experimental results show that the Diamond sketch outperforms the best of the five typical sketches by up to 508.3 times in terms of relative error while keeping comparable speed. We made the source code of all the six sketches available on GitHub [1] .

[1]  Shigang Chen,et al.  Per-Flow Traffic Measurement Through Randomized Counter Sharing , 2012, IEEE/ACM Trans. Netw..

[2]  Roy Friedman,et al.  A formal analysis of conservative update based approximate counting , 2015, 2015 International Conference on Computing, Networking and Communications (ICNC).

[3]  Graham Cormode,et al.  Summarizing and Mining Skewed Data Streams , 2005, SDM.

[4]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.

[5]  Shrideep Pallickara,et al.  Synopsis: A Distributed Sketch over Voluminous Spatiotemporal Observational Streams , 2017, IEEE Transactions on Knowledge and Data Engineering.

[6]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[7]  Feng Wang,et al.  Matching the speed gap between SRAM and DRAM , 2008, 2008 International Conference on High Performance Switching and Routing.

[8]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[9]  Jih-Kwon Peir,et al.  Fit a Spread Estimator in Small Memory , 2009, IEEE INFOCOM 2009.

[10]  Ken R. Duffy,et al.  Modeling conservative updates in multi-hash approximate count sketches , 2012, 2012 24th International Teletraffic Congress (ITC 24).

[11]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[12]  Andrea Montanari,et al.  Counter braids: a novel counter architecture for per-flow measurement , 2008, SIGMETRICS '08.

[13]  Graham Cormode,et al.  Sketch Techniques for Approximate Query Processing , 2010 .

[14]  Minlan Yu,et al.  FlowRadar: A Better NetFlow for Data Centers , 2016, NSDI.

[15]  Viktor K. Prasanna,et al.  Sketch Acceleration on FPGA and its Applications in Network Anomaly Detection , 2018, IEEE Transactions on Parallel and Distributed Systems.

[16]  Raquel Menezes,et al.  Extrema Propagation: Fast Distributed Estimation of Sums and Network Sizes , 2012, IEEE Transactions on Parallel and Distributed Systems.

[17]  Xenofontas A. Dimitropoulos,et al.  Probabilistic lossy counting: an efficient algorithm for finding heavy hitters , 2008, CCRV.

[18]  Josep-Lluís Larriba-Pey,et al.  Dynamic count filters , 2006, SGMD.

[19]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[20]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[21]  Ramesh Govindan,et al.  Detection and identification of network anomalies using sketch subspaces , 2006, IMC '06.

[22]  David M. W. Powers,et al.  Applications and Explanations of Zipf’s Law , 1998, CoNLL.

[23]  Minlan Yu,et al.  BUFFALO: bloom filter forwarding architecture for large organizations , 2009, CoNEXT '09.

[24]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[25]  Duane Wessels,et al.  High‐performance benchmarking with Web Polygraph , 2004, Softw. Pract. Exp..

[26]  Mukesh K. Mohania,et al.  Time-decaying Bloom Filters for data streams with skewed distributions , 2005, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05).

[27]  Qi Li,et al.  Guarantee IP lookup performance with FIB explosion , 2015, SIGCOMM 2015.

[28]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[29]  George Varghese,et al.  Efficient implementation of a statistics counter architecture , 2003, SIGMETRICS '03.

[30]  Gustavo Alonso,et al.  Augmented Sketch: Faster and More Accurate Stream Processing , 2016, SIGMOD Conference.