Hashing Pursuit for Online Identification of Heavy-Hitters in High-Speed Network Streams

Distributed Denial of Service (DDoS) attacks have become more prominent recently, both in frequency of occurrence, as well as magnitude. Such attacks render key Internet resources unavailable and disrupt its normal operation. It is therefore of paramount importance to quickly identify malicious Internet activity. The DDoS threat model includes characteristics such as: (i) heavy-hitters that transmit large volumes of traffic towards "victims", (ii) persistent-hitters that send traffic, not necessarily large, to specific destinations to be used as attack facilitators, (iii) host and port scanning for compiling lists of un-secure servers to be used as attack amplifiers, etc. This conglomeration of problems motivates the development of space/time efficient summaries of data traffic streams that can be used to identify heavy-hitters associated with the above attack vectors. This paper presents a hashing-based framework and fast algorithms that take into account the large-dimensionality of the incoming network stream and can be employed to quickly identify the culprits. The algorithms and data structures proposed provide a synopsis of the network stream that is not taxing to fast-memory, and can be efficiently implemented in hardware due to simple bit-wise operations. The methods are evaluated using real-world Internet data from a large academic network.

[1]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[2]  Marina Thottan,et al.  Anomaly Detection Approaches for Communication Networks , 2010, Algorithms for Next Generation Networks.

[3]  Christian Rossow,et al.  Amplification Hell: Revisiting Network Protocols for DDoS Abuse , 2014, NDSS.

[4]  R. Vershynin,et al.  One sketch for all: fast algorithms for compressed sensing , 2007, STOC '07.

[5]  Graham Cormode,et al.  What's new: finding significant differences in network data streams , 2004, INFOCOM 2004.

[6]  Yan Gao,et al.  A DoS Resilient Flow-level Intrusion Detection Approach for High-speed Networks , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[7]  Shai Halevi,et al.  Invertible Universal Hashing and the TET Encryption Mode , 2007, CRYPTO.

[8]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[9]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[10]  George Kollios,et al.  Norm, Point, and Distance Estimation Over Multiple Signals Using Max-Stable Distributions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Hisashi Kashima,et al.  Eigenspace-based anomaly detection in computer systems , 2004, KDD.

[12]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[13]  Eric Wustrow,et al.  Internet background radiation revisited , 2010, IMC '10.

[14]  Robert Beverly,et al.  The spoofer project: inferring the extent of source address filtering on the internet , 2005 .

[15]  Ely Porat,et al.  Approximate Sparse Recovery: Optimizing Time and Measurements , 2012, SIAM J. Comput..

[16]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.

[17]  Divesh Srivastava,et al.  Holistic UDAFs at streaming speeds , 2004, SIGMOD '04.

[18]  Piotr Indyk Explicit constructions for compressed sensing of sparse signals , 2008, SODA '08.

[19]  Ming-Yang Kao,et al.  Reverse Hashing for High-Speed Network Monitoring: Algorithms, Evaluation, and Applications , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[20]  Joel A. Tropp,et al.  Algorithmic linear dimension reduction in the l_1 norm for sparse vectors , 2006, ArXiv.

[21]  Robert S. Boyer,et al.  MJRTY: A Fast Majority Vote Algorithm , 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe.

[22]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[23]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[24]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[25]  Ely Porat,et al.  Sublinear time, measurement-optimal, sparse recovery for all , 2012, SODA.

[26]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[27]  Joan Feigenbaum,et al.  Learning-based anomaly detection in BGP updates , 2005, MineNet '05.