A multi-step outlier-based anomaly detection approach to network-wide traffic

Abstract Outlier detection is of considerable interest in fields such as physical sciences, medical diagnosis, surveillance detection, fraud detection and network anomaly detection. The data mining and network management research communities are interested in improving existing score-based network traffic anomaly detection techniques because of ample scopes to increase performance. In this paper, we present a multi-step outlier-based approach for detection of anomalies in network-wide traffic. We identify a subset of relevant traffic features and use it during clustering and anomaly detection. To support outlier-based network anomaly identification, we use the following modules: a mutual information and generalized entropy based feature selection technique to select a relevant non-redundant subset of features, a tree-based clustering technique to generate a set of reference points and an outlier score function to rank incoming network traffic to identify anomalies. We also design a fast distributed feature extraction and data preparation framework to extract features from raw network-wide traffic. We evaluate our approach in terms of detection rate, false positive rate, precision, recall and F -measure using several high dimensional synthetic and real-world datasets and find the performance superior in comparison to competing algorithms.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Zhen Liu,et al.  A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion , 2015, Neurocomputing.

[3]  Clara Pizzuti,et al.  Distance-based detection and prediction of outliers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[4]  Rachid Beghdad,et al.  Critical Study of Supervised Learning Techniques in Predicting Attacks , 2010, Inf. Secur. J. A Glob. Perspect..

[5]  Charu C. Aggarwal,et al.  Classification and Adaptive Novel Class Detection of Feature-Evolving Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[7]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[8]  Richard R. Brooks,et al.  Deceiving entropy based DoS detection , 2015, Comput. Secur..

[9]  Anne E. James,et al.  Improving network intrusion detection system performance through quality of service configuration and parallel technology , 2015, J. Comput. Syst. Sci..

[10]  Michael Georgiopoulos,et al.  A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes , 2010, Data Mining and Knowledge Discovery.

[11]  Chih-Fong Tsai,et al.  A triangle area based nearest neighbors approach to intrusion detection , 2010, Pattern Recognit..

[12]  Nerijus Paulauskas,et al.  Local outlier factor use for the network flow anomaly detection , 2015, Secur. Commun. Networks.

[13]  Juan-Zi Li,et al.  A multi-objective evolutionary algorithm for feature selection based on mutual information with a new redundancy measure , 2015, Inf. Sci..

[14]  Marina Thottan,et al.  Anomaly detection in IP networks , 2003, IEEE Trans. Signal Process..

[15]  Philippe Owezarski,et al.  Unsupervised Network Intrusion Detection Systems: Detecting the Unknown without Knowledge , 2012, Comput. Commun..

[16]  Edward Hung,et al.  Mining Outliers with Faster Cutoff Update and Space Utilization , 2009, PAKDD.

[17]  Witold Pedrycz,et al.  Global and local structure preserving sparse subspace learning: An iterative approach to unsupervised feature selection , 2015, Pattern Recognit..

[18]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[19]  Andrew H. Sung,et al.  Intrusion detection using neural networks and support vector machines , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[20]  Pang-Ning Tan,et al.  Outrank: a Graph-Based Outlier Detection Framework Using Random Walk , 2008, Int. J. Artif. Intell. Tools.

[21]  Clayton R. Pereira,et al.  A nature-inspired approach to speed up optimum-path forest clustering and its application to intrusion detection in computer networks , 2015, Inf. Sci..

[22]  Ji Zhang,et al.  Detecting anomalies from big network traffic data using an adaptive detection approach , 2015, Inf. Sci..

[23]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[24]  Jugal K. Kalita,et al.  Surveying Port Scans and Their Detection Methodologies , 2011, Comput. J..

[25]  Shi-Jinn Horng,et al.  A novel intrusion detection system based on hierarchical clustering and support vector machines , 2011, Expert Syst. Appl..

[26]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[27]  Brian Swingle,et al.  Rényi entropy, mutual information, and fluctuation properties of Fermi liquids , 2010, 1007.4825.

[28]  Dhruba K. Bhattacharyya,et al.  Network Anomaly Detection: A Machine Learning Perspective , 2013 .

[29]  Hiroki Takakura,et al.  Toward a more practical unsupervised anomaly detection system , 2013, Inf. Sci..

[30]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[31]  Srinivasan Parthasarathy,et al.  Distance-based outlier detection , 2010, Proc. VLDB Endow..

[32]  Joel J. P. C. Rodrigues,et al.  Autonomous profile-based anomaly detection system using principal component analysis and flow analysis , 2015, Appl. Soft Comput..

[33]  Sushil Jajodia,et al.  Detecting Novel Network Intrusions Using Bayes Estimators , 2001, SDM.

[34]  Dana Kulic,et al.  An evaluation of classifier-specific filter measure performance for feature selection , 2015, Pattern Recognit..

[35]  Hui Wang,et al.  A clustering-based method for unsupervised intrusion detections , 2006, Pattern Recognit. Lett..

[36]  Nasser Yazdani,et al.  Mutual information-based feature selection for intrusion detection systems , 2011, J. Netw. Comput. Appl..

[37]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[38]  Mohammad Zulkernine,et al.  Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection , 2006, 2006 IEEE International Conference on Communications.

[39]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[40]  Osmar R. Zaïane,et al.  An Efficient Reference-Based Approach to Outlier Detection in Large Datasets , 2006, Sixth International Conference on Data Mining (ICDM'06).

[41]  Jugal K. Kalita,et al.  Towards Generating Real-life Datasets for Network Intrusion Detection , 2015, Int. J. Netw. Secur..

[42]  Tao Li,et al.  Novel heuristic dual-ant clustering algorithm for network intrusion outliers detection , 2015 .

[43]  Ankur Agrawal,et al.  Local Subspace Based Outlier Detection , 2009, IC3.

[44]  Jugal K. Kalita,et al.  NADO: network anomaly detection using outlier approach , 2011, ICCCS '11.

[45]  Chih-Fong Tsai,et al.  CANN: An intrusion detection system based on combining cluster centers and nearest neighbors , 2015, Knowl. Based Syst..

[46]  Azuraliza Abu Bakar,et al.  Outlier detection based on rough sets theory , 2009, Intell. Data Anal..

[47]  Wei Jiang,et al.  On-line outlier detection and data cleaning , 2004, Comput. Chem. Eng..

[48]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[49]  Jong-Seok Lee,et al.  A precise ranking method for outlier detection , 2015, Inf. Sci..

[50]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[51]  Su Yang,et al.  LDBOD: A novel local distribution based outlier detector , 2008, Pattern Recognit. Lett..

[52]  Dong Hyun Jeong,et al.  A multi-level intrusion detection method for abnormal network behaviors , 2016, J. Netw. Comput. Appl..

[53]  Vahab Mirrokni,et al.  Overlapping clusters for distributed computation , 2012, WSDM '12.

[54]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[55]  Francesco Palmieri,et al.  An uncertainty-managing batch relevance-based approach to network anomaly detection , 2015, Appl. Soft Comput..

[56]  J. Zhan,et al.  A Novel Outlier Detection Scheme for Network Intrusion Detection Systems , 2008, 2008 International Conference on Information Security and Assurance (isa 2008).