Hierarchical Heavy Hitters with the Space Saving Algorithm

The Hierarchical Heavy Hitters problem extends the notion of frequent items to data arranged in a hierarchy. This problem has applications to network traffic monitoring, anomaly detection, and DDoS detection. We present a new streaming approximation algorithm for computing Hierarchical Heavy Hitters that has several advantages over previous algorithms. It improves on the worst-case time and space bounds of earlier algorithms, is conceptually simple and substantially easier to implement, offers improved accuracy guarantees, is easily adopted to a distributed or parallel setting, and can be efficiently implemented in commodity hardware such as ternary content addressable memory (TCAMs). We present experimental results showing that for parameters of primary practical interest, our two-dimensional algorithm is superior to existing algorithms in terms of speed and accuracy, and competitive in terms of space, while our one-dimensional algorithm is also superior in terms of speed and accuracy for a more limited range of parameters.

[1]  Lap-Kei Lee,et al.  A simpler and more efficient deterministic scheme for finding frequent items over sliding windows , 2006, PODS '06.

[2]  Fabrice Guillemin,et al.  Identification of heavyweight address prefix pairs in IP traffic , 2009, 2009 21st International Teletraffic Congress.

[3]  Csaba D. Tóth,et al.  Space complexity of hierarchical heavy hitters in multi-dimensional data streams , 2005, PODS '05.

[4]  Divyakant Agrawal,et al.  Fast data stream algorithms using associative memories , 2007, SIGMOD '07.

[5]  L. Moser,et al.  AN EXTREMAL PROBLEM IN GRAPH THEORY , 2001 .

[6]  Divesh Srivastava,et al.  Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data , 2004, SIGMOD '04.

[7]  Cristian Estan,et al.  Interactive Traffic Analysis and Visualization with Wisconsin Netpy , 2005, LISA.

[8]  P. Erdös On an extremal problem in graph theory , 1970 .

[9]  Piotr Indyk,et al.  Space-optimal heavy hitters with strong error bounds , 2010, TODS.

[10]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[11]  Divesh Srivastava,et al.  Finding hierarchical heavy hitters in streaming data , 2008, TKDD.

[12]  Minlan Yu,et al.  Online Measurement of Large Traffic Aggregates on Commodity Switches , 2011, Hot-ICE.

[13]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Marios Hadjieleftheriou,et al.  Finding frequent items in data streams , 2008, Proc. VLDB Endow..

[15]  Vyas Sekar,et al.  LADS: Large-scale Automated DDoS Detection System , 2006, USENIX Annual Technical Conference, General Track.

[16]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[17]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[18]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[19]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[20]  Carsten Lund,et al.  Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications , 2004, IMC '04.

[21]  Hongyan Liu,et al.  Separator: Sifting Hierarchical Heavy Hitters Accurately from Data Streams , 2007, ADMA.

[22]  George Varghese,et al.  Automatically inferring patterns of resource consumption in network traffic , 2003, SIGCOMM '03.

[23]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.