False positive elimination in intrusion detection based on clustering

In order to solve the problem of high false positive in network intrusion detection systems, we adopted clustering algorithms, the K-means algorithm and the Fuzzy C Mean (FCM) algorithm, to identify false alerts, to reduce invalid alerts and to purify alerts for a better analysis. In this paper, we first introduced typical clustering algorithms, including the partition clustering, the hierarchical clustering, the density and grid clustering, and the fuzzy clustering, and then analyzed their feasibilities in security data processing. Furthermore, we introduced an intrusion detection framework, and tested the validity and feasibility of false positive elimination in intrusion detection. The process steps of false positive elimination were clearly described, and additionally, two typical clustering algorithms, the K-means algorithm and the FCM algorithm, were implemented for false alerts identification and filtration. Also, we defined three evaluation indexes: the elimination rate, the false elimination rate and the miss elimination rate. Accordingly, we used DARPA 2000 LLDOS1.0 dataset for our experiments, and adopted Snort as our intrusion detection system. Eventually, the results showed that the method proposed by us has a satisfactory validity and feasibility in false positive elimination, and the clustering algorithms we adopted can achieve a high elimination rate.

[1]  Yu Xiao,et al.  Semi-Supervised Clustering Based on Affinity Propagation Algorithm: Semi-Supervised Clustering Based on Affinity Propagation Algorithm , 2009 .

[2]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[3]  Giovanna Castellano,et al.  Shape annotation by semi-supervised fuzzy clustering , 2014, Inf. Sci..

[4]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[5]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[6]  Jiye Liang,et al.  Fast global k-means clustering based on local geometrical information , 2013, Inf. Sci..

[7]  Lili Yu,et al.  Improved FCM algorithm based on the initial clustering center selection , 2013, 2013 3rd International Conference on Consumer Electronics, Communications and Networks.

[8]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[9]  Michael E. Webber,et al.  Clustering analysis of residential electricity demand profiles , 2014 .

[10]  Yanjun Wang Network Intrusion Detection Technology based on Improved C-means Clustering Algorithm , 2013, J. Networks.

[11]  Frédéric Cuppens,et al.  Managing alerts in a multi-intrusion detection environment , 2001, Seventeenth Annual Computer Security Applications Conference.

[12]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[13]  Zhang Huan-guo An Unsupervised Clustering-Based Intrusion Detection Method , 2003 .

[14]  S. Majumdar,et al.  Local bone enhancement fuzzy clustering for segmentation of MR trabecular bone images. , 2009, Medical physics.

[15]  Xi Lian-xia,et al.  New developments of clustering methods in data mining , 2008 .

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[18]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[19]  Daniel A. Keim,et al.  Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering , 1999, VLDB.

[20]  Horng-Lin Shieh A Hybrid Fuzzy Clustering Method with a Robust Validity Index , 2014 .

[21]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[22]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[23]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .