Using Gaussian Mixture Models to Detect Outliers in Seasonal Univariate Network Traffic

This article presents an algorithm to detect outliers in seasonal, univariate network traffic data using Gaussian Mixture Models (GMMs). Additionally we show that this methodology can easily be implemented in a big data scenario and delivers the required information to a security analyst in an efficient manner. The unsupervised clustering algorithm GMM, is modified such that all data points in a set are labelled as either outliers or normal data points. In this article, the algorithm is only evaluated on time series data obtained from network traffic, however it can easily be modified to be used for other types of seasonal univariate big data sets. Detecting outliers in network traffic data occurs in two stages. First, GMMs are built for training data in each time bin of seasonal time series data. Outliers or anomalies are detected and removed in this training data set by examining the probability associated with each data point. Second, GMMs are rebuilt after outliers are removed in historical or training data and the re-computed GMMs are used to detect outliers in test data. Results are compared to traditional methods of outlier detection which usually treat all data from a set as coming from a single probability density function.

[1]  F. Kelly The Mathematics of Traffic in Networks , 2006 .

[2]  Maciej Szmit,et al.  Usage of Modified Holt-Winters Method in the Anomaly Detection of Network Traffic: Case Studies , 2012, J. Comput. Networks Commun..

[3]  Grzegorz Kołaczek,et al.  Anomaly Detection in Network Traffic Using Selected Methods of Time Series Analysis , 2015 .

[4]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[5]  Wei Jiang,et al.  Intrusion Detection Based on Improved Fuzzy C-means Algorithm , 2008, 2008 International Symposium on Information Science and Engineering.

[6]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[7]  Won Suk Lee,et al.  An anomaly intrusion detection method by clustering normal user behavior , 2003, Comput. Secur..

[8]  Meng Jianliang,et al.  The Application on Intrusion Detection Based on K-means Cluster Algorithm , 2009, 2009 International Forum on Information Technology and Applications.

[9]  Roman Jasek,et al.  Usage of Modern Exponential-Smoothing Models in Network Traffic Modelling , 2013, NOSTRADAMUS.

[10]  F AROOQ A HMAD PRACTICAL NETWORK ANOMALY DETECTION USING DATA MINING TECHNIQUES , 2016 .

[11]  J. Tukey,et al.  Variations of Box Plots , 1978 .

[12]  Rob J. Hyndman,et al.  Large-Scale Unusual Time Series Detection , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.