A Fast and Efficient Local Outlier Detection in Data Streams

Outlier detection in data streams is used in many applications, such as network flow monitoring, stock trading fluctuation detection and network intrusion detection [1]. These applications require that the algorithms finish outlier detection effectively in a limited amount of time and memory space. Local Outlier Factor (LOF) is a fundamental density-based outlier detection algorithm [2], it determines whether an object is an outlier by calculating LOF score of each observer. There are many LOF-based algorithms that have achieved excellent results with respect to outlier detection in data streams, while most of existing LOF-based algorithms have problems with excessive computation. In this paper, we propose a fast outlier detection algorithm in data streams, the algorithm effectively reduces the LOF calculation of the whole data by Z-score pruning. The algorithm consists of three phases. Firstly, generate the prediction data through the generator. Secondly, judge whether the observation object is a potential outlier by the Z-score of the residual from the origin value and the prediction value. Finally, calculate the LOF of the observation object in the current time window according to the judgment result of the previous step. It is proved by experiments that our algorithm effectively reduces the detection time consumption through Z-score pruning under the condition of ensuring the detection accuracy.

[1]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[2]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[3]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[5]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[6]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[7]  Mahsa Salehi,et al.  Fast Memory Efficient Local Outlier Detection in Data Streams , 2017, IEEE Transactions on Knowledge and Data Engineering.

[8]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[9]  Héctor Pomares,et al.  Hybridization of intelligent techniques and ARIMA models for time series prediction , 2008, Fuzzy Sets Syst..

[10]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[11]  Markus Goldstein,et al.  FastLOF: An Expectation-Maximization based Local Outlier detection algorithm , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[12]  Hwanjo Yu,et al.  DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams , 2018, KDD.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[17]  ˇ Tom´ EXPONENTIAL SMOOTHING FOR IRREGULAR TIME SERIES , 2008 .

[18]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.