MV-Sketch: A Fast and Compact Invertible Sketch for Heavy Flow Detection in Network Data Streams

Fast detection of heavy flows (e.g., heavy hitters and heavy changers) in massive network traffic is challenging due to the stringent requirements of fast packet processing and limited resource availability. Invertible sketches are summary data structures that can recover heavy flows with small memory footprints and bounded errors, yet existing invertible sketches incur high memory access overhead that leads to performance degradation. We present MV-Sketch, a fast and compact invertible sketch that supports heavy flow detection with small and static memory allocation. MV-Sketch tracks candidate heavy flows inside the sketch data structure via the idea of majority voting, such that it incurs small memory access overhead in both update and query operations, while achieving high detection accuracy. We present theoretical analysis on the memory usage, performance, and accuracy of MV-Sketch. Trace-driven evaluation shows that MV-Sketch achieves higher accuracy than existing invertible sketches, with up to $3.38 \times$ throughput gain. We also show how to boost the performance of MV-Sketch with SIMD instructions.

[1]  Ramesh Govindan,et al.  SCREAM: sketch resource allocation for software-defined measurement , 2015, CoNEXT.

[2]  Xin Jin,et al.  SketchVisor: Robust Network Measurement for Software Packet Processing , 2017, SIGCOMM.

[3]  Minlan Yu,et al.  A Comparison of Performance and Accuracy of Measurement Algorithms in Software , 2018, SOSR.

[4]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[5]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[6]  Graham Cormode,et al.  What's new: finding significant differences in network data streams , 2004, IEEE/ACM Transactions on Networking.

[7]  George Michailidis,et al.  AMON: An Open Source Architecture for Online Monitoring, Statistical Analysis, and Forensics of Multi-Gigabit Streams , 2015, IEEE Journal on Selected Areas in Communications.

[8]  Steve Uhlig,et al.  HeavyKeeper: An Accurate Algorithm for Finding Top- $k$ Elephant Flows , 2019, IEEE/ACM Transactions on Networking.

[9]  Patrick P. C. Lee,et al.  Sketchlearn: relieving user burdens in approximate measurement with automated statistical inference , 2018, SIGCOMM.

[10]  Jin Cao,et al.  Sequential hashing: A flexible approach for unveiling significant patterns in high speed networks , 2010, Comput. Networks.

[11]  Peng Liu,et al.  Elastic sketch: adaptive and fast network-wide measurements , 2018, SIGCOMM.

[12]  David P. Woodruff,et al.  Fast Manhattan sketches in data streams , 2010, PODS '10.

[13]  Roy Friedman,et al.  Randomized admission policy for efficient top-k and frequency estimation , 2016, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[14]  Minlan Yu,et al.  Software Defined Traffic Measurement with OpenSketch , 2013, NSDI.

[15]  David Eppstein,et al.  Straggler Identification in Round-Trip Data Streams via Newton's Identities and Invertible Bloom Filters , 2007, IEEE Transactions on Knowledge and Data Engineering.

[16]  Patrick P. C. Lee,et al.  A hybrid local and distributed sketching design for accurate and scalable heavy key detection in network data streams , 2015, Comput. Networks.

[17]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[18]  S. Muthukrishnan,et al.  Heavy-Hitter Detection Entirely in the Data Plane , 2016, SOSR.

[19]  Roy Friedman,et al.  Optimal elephant flow detection , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[20]  Michael T. Goodrich,et al.  Invertible bloom lookup tables , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[21]  Yin Zhang,et al.  On the characteristics and origins of internet flow rates , 2002, SIGCOMM '02.

[22]  Yong Guan,et al.  A fast sketch for aggregate queries over high-speed network traffic , 2012, 2012 Proceedings IEEE INFOCOM.

[23]  Vladimir Braverman,et al.  One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon , 2016, SIGCOMM.

[24]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[25]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[26]  Roy Friedman,et al.  Heavy hitters in streams and sliding windows , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[27]  Abhishek Kumar,et al.  Joint data streaming and sampling techniques for detection of super sources and destinations , 2005, IMC '05.

[28]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[29]  Minlan Yu,et al.  FlowRadar: A Better NetFlow for Data Centers , 2016, NSDI.

[30]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[31]  Robert S. Boyer,et al.  MJRTY: A Fast Majority Vote Algorithm , 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe.