Sequential hashing: A flexible approach for unveiling significant patterns in high speed networks

Identification of significant patterns in network traffic, such as IPs or flows that contribute large volume (heavy hitters) or those that introduce large changes of volume (heavy changers), has many applications in accounting and network anomaly detection. As network speed and the number of flows grow rapidly, identifying heavy hitters/changers by tracking per-IP or per-flow statistics becomes infeasible due to both the computational overhead and memory requirements. In this paper, we propose SeqHash, a novel sequential hashing scheme that supports fast and accurate recovery of heavy hitters/changers, while requiring memory just slightly higher than the theoretical lower bound. SeqHash monitors data traffic using a sketch data structure that can flexibly trade-off between the memory usage and the computational overhead in a large range that can be utilized by different computer architectures for optimizing the overall performance. In addition, we propose statistically efficient algorithms for estimating the values of heavy hitters/changers. Using both mathematical analysis and experimental studies of Internet traces, we demonstrate that SeqHash can achieve the same accuracy as the existing methods do but using much less memory and computational overhead.

[1]  Xenofontas A. Dimitropoulos,et al.  Probabilistic lossy counting: an efficient algorithm for finding heavy hitters , 2008, CCRV.

[2]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[3]  Vern Paxson,et al.  An architecture for exploiting multi-core processors to parallelize network intrusion prevention , 2009, NSS 2009.

[4]  Graham Cormode,et al.  What's new: finding significant differences in network data streams , 2004, INFOCOM 2004.

[5]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[6]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[7]  A. Kumar,et al.  Space-code bloom filter for efficient per-flow traffic measurement , 2004, IEEE INFOCOM 2004.

[8]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[9]  Jin Cao,et al.  A Fast and Compact Method for Unveiling Significant Patterns in High Speed Networks , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[10]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[11]  Murali S. Kodialam,et al.  Runs based traffic estimator (RATE): a simple, memory efficient scheme for per-flow rate estimation , 2004, IEEE INFOCOM 2004.

[12]  Yin Zhang,et al.  Improving sketch reconstruction accuracy using linear least squares method , 2005, IMC '05.

[13]  Vern Paxson,et al.  An architecture for exploiting multi-core processors to parallelize network intrusion prevention , 2007 .

[14]  Ming-Yang Kao,et al.  Reversible sketches: enabling monitoring and analysis over high-speed data streams , 2007, TNET.

[15]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[16]  Konstantina Papagiannaki,et al.  Network performance monitoring at small time scales , 2003, IMC '03.