Fast Anomaly Detection in Multiple Multi-Dimensional Data Streams

Multiple multi-dimensional data streams are ubiquitous in the modern world, such as IoT applications, GIS applications and social networks. Detecting anomalies in such data streams in real-time is an important and challenging task. It is able to provide valuable information from data and then assists decision-making. However, exiting approaches for anomaly detection in multi-dimensional data streams have not properly considered the correlations among multiple multi-dimensional streams. Moreover, for multi-dimensional streaming data, online detection speed is often an important concern. In this paper, we propose a fast yet effective anomaly detection approach in multiple multi-dimensional data streams. This is based on a combination of ideas, i.e., stream pre-processing, locality sensitive hashing and dynamic isolation forest. Experiments on real datasets demonstrate that our approach achieves a magnitude increase in its efficiency compared with state-of-the-art approaches while maintaining competitive detection accuracy.

[1]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[2]  Shirish Tatikonda,et al.  Locality Sensitive Outlier Detection: A ranking driven approach , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[3]  Zhi-Hua Zhou,et al.  On Detecting Clustered Anomalies Using SCiForest , 2010, ECML/PKDD.

[4]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[5]  Le Gruenwald,et al.  Wadjet: Finding Outliers in Multiple Multi-Dimensional Heterogeneous Data Streams , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[6]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[7]  K. Srinathan,et al.  LSH based outlier detection and its application in distributed setting , 2011, CIKM '11.

[8]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[9]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[11]  Qiang He,et al.  LSHiForest: A Generic Framework for Fast Tree Isolation Based Ensemble Anomaly Analysis , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[12]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[13]  Xiangliang Zhang,et al.  A PCA-Based Change Detection Framework for Multidimensional Data Streams KAUST Repository , 2015 .

[14]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[15]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Kai Ming Ting,et al.  Efficient Anomaly Detection by Isolation Using Nearest Neighbour Ensemble , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[17]  Wojciech Szpankowski On the Analysis of the Average Height of a Digital Trie: Another Approach , 1986 .

[18]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[19]  Charu Agarwal,et al.  Outlier ensembles , 2013, ODD '13.

[20]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[21]  Ying Liu,et al.  Cluster-based outlier detection , 2009, Ann. Oper. Res..

[22]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.