A fast sketch for aggregate queries over high-speed network traffic

There have been security problems and network failures that are hard to resolve, for example, botnets, polymorphic worm/virus, DDoS, etc. To address them, we need to monitor the traffic dynamics and have a network-wide view about them, and more importantly, be able to detect attacks and failures in a timely manner. Due to the rapid increase in the traffic volume, it is often infeasible to monitor every individual flow in the backbone network due to space and time constraints. Instead, we are often required to aggregate packets into a small number of flows and develop the detection methods with aggregated flows, namely aggregate queries. Although it enables ISPs to detect network problems in a timely manner, the flow aggregation cannot preserve certain critical information in network traffic, e.g., IP addresses, port numbers, etc. Due to such missing information, it becomes very difficult (or often infeasible) for ISPs to identify the sources of network attacks or the causes of traffic anomalies, which are important to resolve the network problems effectively. In this paper, we propose an efficient data structure, namely the fast sketch, which can both aggregate packets into a small number of flows, and further enable ISPs to identify the anomalous keys (IP addresses, port numbers, etc.), with small space and time. With it, the number of aggregated flows can achieve the lower bound of the heavy-change detection, i.e., Ω(k log(n/k)), where n is the range of flow keys and k is an upper bound of the number of anomalous keys. In addition, our sketch combines both the combinatorial group testing and the quotient technique to identify anomalous keys, which can guarantee a sub-linear running time. We expect our work will improve the practice for real-time traffic monitoring in a high-speed networked system.

[1]  Clerry,et al.  Compact Hash Tables Using Bidirectional Linear Probing , 1984, IEEE Trans. Computers.

[2]  C. SIAMJ. LOW REDUNDANCY IN STATIC DICTIONARIES WITH CONSTANT QUERY TIME , 2001 .

[3]  Behrouz A. Forouzan TCP/IP Protocol Suite , 1999 .

[4]  Fernando Silveira,et al.  URCA: Pulling out Anomalies by their Root Causes , 2010, 2010 Proceedings IEEE INFOCOM.

[5]  D. Du,et al.  Combinatorial Group Testing and Its Applications , 1993 .

[6]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[7]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[8]  Yong Guan,et al.  Sketch-Based Streaming PCA Algorithm for Network-Wide Traffic Anomaly Detection , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[9]  Ming-Yang Kao,et al.  Reversible sketches: enabling monitoring and analysis over high-speed data streams , 2007, TNET.

[10]  Tao Qin,et al.  A New Data Streaming Method for Locating Hosts with Large Connection Degree , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[11]  David P. Woodruff,et al.  Fast Manhattan sketches in data streams , 2010, PODS '10.

[12]  Walter Willinger,et al.  cSamp: A System for Network-Wide Flow Monitoring , 2008, NSDI.

[13]  David P. Woodruff,et al.  Lower bounds for sparse recovery , 2010, SODA '10.

[14]  Ramesh Govindan,et al.  Detection and identification of network anomalies using sketch subspaces , 2006, IMC '06.

[15]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.