Anomalous Network Packet Detection Using Data Stream Mining

In recent years, significant research has been devoted to the development of Intrusion Detection Systems (IDS) able to detect anomalous computer network traffic indicative of malicious activity. While signature-based IDS have proven effective in discovering known attacks, anomaly-based IDS hold the even greater promise of being able to automatically detect previously undocumented threats. Traditional IDS are generally trained in batch mode, and therefore cannot adapt to evolving network data streams in real time. To resolve this limitation, data stream mining techniques can be utilized to create a new type of IDS able to dynamically model a stream of network traffic. In this paper, we present two methods for anomalous network packet detection based on the data stream mining paradigm. The first of these is an adapted version of the DenStream algorithm for stream clustering specifically tailored to evaluate network traffic. In this algorithm, individual packets are treated as points and are flagged as normal or abnormal based on their belonging to either normal or outlier clusters. The second algorithm utilizes a histogram to create a model of the evolving network traffic to which incoming traffic can be compared using Pearson correlation. Both of these algorithms were tested using the first week of data from the DARPA ’99 dataset with Generic HTTP, Shell-code and Polymorphic attacks inserted. We were able to achieve reasonably high detection rates with moderately low false positive percentages for different types of attacks, though detection rates varied between the two algorithms. Overall, the histogram-based detection algorithm achieved slightly superior results, but required more parameters than the clustering-based algorithm. As a result of its fewer parameter requirements, the clustering approach can be more easily generalized to different types of network traffic streams.

[1]  Hajime Inoue,et al.  Comparing Anomaly Detection Techniques for HTTP , 2007, RAID.

[2]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[3]  Richard Lippmann,et al.  The 1999 DARPA off-line intrusion detection evaluation , 2000, Comput. Networks.

[4]  Wenke Lee,et al.  McPAD: A multiple classifier system for accurate payload-based anomaly detection , 2009, Comput. Networks.

[5]  Philip K. Chan,et al.  Learning nonstationary models of normal network traffic for detecting novel attacks , 2002, KDD.

[6]  Guofei Gu,et al.  Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Network Intrusion Detection , 2004, RAID.

[8]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[9]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[10]  Jiaowei Tang,et al.  An Algorithm for Streaming Clustering , 2011 .

[11]  Salvatore J. Stolfo,et al.  Network payload-based anomaly detection and content-based alert correlation , 2007 .

[12]  Matthew V. Mahoney,et al.  Network traffic anomaly detection based on packet bytes , 2003, SAC '03.

[13]  K. Mumtaz An Analysis on Density Based Clustering of Multi Dimensional Spatial Data , 2010 .

[14]  Giandomenico Spezzano,et al.  FlockStream: A Bio-Inspired Algorithm for Clustering Evolving Data Streams , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Ian Witten,et al.  Data Mining , 2000 .