An Empirical Evaluation of Entropy-based Anomaly Detection

There is considerable interest in using entropy-based analysis of traffic feature distributions for anomaly detection. Entropy-based metrics are appealing since they provide more fine-grained insights into traffic structure than traditional traffic volume analysis. While previous work has demonstrated the benefits of using the entropy of different traffic distributions in isolation to detect anomalies, there has been little effort in comprehensively understanding the detection power provided by entropy-based analysis of multiple traffic distribution used in conjunction with each other. We compare and contrast the anomaly detection capabilities provided by different entropybased metrics. We consider two classes of distributions: flow-header features (IP addresses, ports, and flow-sizes), and behavioral features (outand in-degree of hosts measuring the number of distinct destination/source IP addresses that each host communicates with). Somewhat surprisingly, we observe that the entropy of the address and port distributions are strongly correlated with each other, and also detect very similar anomalies in our traffic trace. The behavioral and flow size distributions appear less correlated and detect incidents that do not show up as anomalies among the port and address distributions. Further analysis using synthetically generated anomalies also suggests that the port and address distributions have limited utility in detecting scan and bandwidth flood anomalies. Based on our results we derive implications for selecting traffic distributions in entropy-based anomaly detection. In support of the thesis and future work, we present the Datapository Anomaly Detection Testbed, a framework and storage facility for analyzing and developing detection methods, generating and labeling anomalies, and analyzing traffic features with user provided traffic sets or publicly available traffic sets in the Datapository database. Through the collaboration of future users, we hope to expand the set of available detection methods, synthetic anomaly models, and publicly available traffic data and tools for analysis. To the Greeks, whose support and dancing gets me through the day.

[1]  Vinod Yegneswaran,et al.  Internet intrusions: global characteristics and prevalence , 2003, SIGMETRICS '03.

[2]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[3]  Hui Zang,et al.  Is sampled data sufficient for anomaly detection? , 2006, IMC '06.

[4]  Peter Phaal,et al.  InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks , 2001, RFC.

[5]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.

[6]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[7]  Paul Barford,et al.  Self-configuring network traffic generation , 2004, IMC '04.

[8]  Amin Vahdat,et al.  Realistic and responsive network traffic generation , 2006, SIGCOMM.

[9]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[10]  Abhishek Kumar,et al.  Data streaming algorithms for efficient and accurate estimation of flow size distribution , 2004, SIGMETRICS '04/Performance '04.

[11]  Vyas Sekar,et al.  Data streaming algorithms for estimating entropy of network traffic , 2006, SIGMETRICS '06/Performance '06.

[12]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[13]  Jules J. Berman,et al.  Ruby: The Programming Language , 2008 .

[14]  Dan Schnackenberg,et al.  Statistical approaches to DDoS attack detection and response , 2003, Proceedings DARPA Information Survivability Conference and Exposition.

[15]  Eddie Kohler,et al.  Observed Structure of Addresses in IP Traffic , 2002, IEEE/ACM Transactions on Networking.

[16]  Vyas Sekar,et al.  LADS: Large-scale Automated DDoS Detection System , 2006, USENIX Annual Technical Conference, General Track.

[17]  Matthew Roughan,et al.  Experience in measuring internet backbone traffic variability: Models metrics, measurements and meaning , 2003 .

[18]  S. Muthukrishnan,et al.  Detecting malicious network traffic using inverse distributions of packet contents , 2005, MineNet '05.

[19]  Hari Balakrishnan,et al.  Fast portscan detection using sequential hypothesis testing , 2004, IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004.

[20]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[21]  Marina Thottan,et al.  Anomaly detection in IP networks , 2003, IEEE Trans. Signal Process..

[22]  Bernhard Plattner,et al.  Entropy based worm and anomaly detection in fast IP networks , 2005, 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise (WETICE'05).

[23]  Donald F. Towsley,et al.  An information-theoretic approach to network monitoring and measurement , 2005, IMC '05.

[24]  Jim Morrison Blaster Revisited , 2004, ACM Queue.

[25]  Mostafa H. Ammar,et al.  Prefix-preserving IP address anonymization: measurement-based security evaluation and a new cryptography-based scheme , 2004, Comput. Networks.