An optimization process to identify outliers generated by intrusion detection systems

An outlier is an inconsistent observation characterized by its dissimilarity from other observations in a given data set. Many research works have focused on outliers detection in many fields such as network security. In order to protect computer network systems from attacks, usually an intrusion detection system IDS is required. However, IDSs generate many outliers which can severely affect their accuracy. In this work, we propose a three-stage method to detect outliers. First, alerts are clustered using the k-means algorithm; then, the generated set of meta-alerts is filtered based on distances between the centroids of the different clusters. Finally, outliers are identified from the filtered meta-alerts using a binary optimization algorithm. Our method is evaluated using the University of California-Irvine machine learning repository and the Defense Advanced Research Projects Agency data sets. Experimental results show that the proposed method outperforms concurrent methods for outlier detection. Copyright © 2015 John Wiley & Sons, Ltd.

[1]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[2]  Dhruba K. Bhattacharyya,et al.  RODHA: Robust Outlier Detection using Hybrid Approach , 2012 .

[3]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[4]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[5]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[6]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[7]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[8]  Ada Wai-Chee Fu,et al.  Enhancements on local outlier detection , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[9]  Sukumar Nandi,et al.  An Outlier Detection Method Based on Clustering , 2011, 2011 Second International Conference on Emerging Applications of Information Technology.

[10]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[11]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[12]  Chih-Hsuan Wang,et al.  Outlier identification and market segmentation using kernel-based clustering techniques , 2009, Expert Syst. Appl..

[13]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[14]  Yong Shi,et al.  Towards exploring interactive relationship between clusters and outliers in multi-dimensional data analysis , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  Maria Papadaki,et al.  A preliminary two-stage alarm correlation and filtering system using SOM neural network and K-means algorithm , 2010, Comput. Secur..

[16]  J. Kalita,et al.  Outlier Identification using Symmetric Neighborhoods , 2012 .

[17]  P. Murugavel,et al.  Improved Hybrid Clustering and Distance-based Technique for Outlier Removal , 2011 .

[18]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[19]  Takafumi Kanamori,et al.  Inlier-Based Outlier Detection via Direct Density Ratio Estimation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[20]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[21]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[22]  Shahrin Sahib,et al.  Intrusion Alert Correlation Technique Analysis for Heterogeneous Log , 2008 .

[23]  S. S. Dhande Outlier Detection over Data Set Using Cluster-Based and Distance-Based Approach , 2012 .

[24]  Lei Cao,et al.  Scalable distance-based outlier detection over high-volume data streams , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[25]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[26]  Fuling Bian,et al.  Cell-Based Outlier Detection Algorithm: A Fast Outlier Detection Algorithm for Large Datasets , 2008, PAKDD.

[27]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[28]  Zengyou He,et al.  An Optimization Model for Outlier Detection in Categorical Data , 2005, ICIC.

[29]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[30]  Bokyoung Kang,et al.  Fast outlier detection for very large log data , 2011, Expert Syst. Appl..

[31]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[32]  Wei-Zhi Wu,et al.  Neighborhood operator systems and approximations , 2002, Inf. Sci..

[33]  Shengrui Wang,et al.  Information-Theoretic Outlier Detection for Large-Scale Categorical Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[34]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[35]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[36]  Osmar R. Zaïane,et al.  A Nonparametric Outlier Detection for Effectively Discovering Top-N Outliers from Engineering Data , 2006, PAKDD.

[37]  Youlin Shang,et al.  Semi-supervised outlier detection based on fuzzy rough C-means clustering , 2010, Math. Comput. Simul..

[38]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[39]  Yumin Chen,et al.  Neighborhood outlier detection , 2010, Expert Syst. Appl..

[40]  Kenji Yamanishi,et al.  Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner , 2001, KDD '01.