Ultrafast Local Outlier Detection from a Data Stream with Stationary Region Skipping

Real-time outlier detection from a data stream is an increasingly important problem, especially as sensor-generated data streams abound in many applications owing to the prevalence of IoT and emergence of digital twins. Several density-based approaches have been proposed to address this problem, but arguably none of them is fast enough to meet the performance demand of real applications. This paper is founded upon a novel observation that, in many regions of the data space, data distributions hardly change across window slides. We propose a new algorithm, abbr. STARE, which identifies local regions in which data distributions hardly change and then skips updating the densities in those regions-a notion called stationary region skipping. Two techniques, data distribution approximation and cumulative net-change-based skip, are employed to efficiently and effectively implement the notion. Extensive experiments using synthetic and real data streams as well as a case study show that STARE is several orders of magnitude faster than the existing algorithms while achieving comparable or higher accuracy.

[1]  Aleksandar Lazarevic,et al.  Outlier Detection with Kernel Density Functions , 2007, MLDM.

[2]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[3]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[4]  Byung Suk Lee,et al.  NETS: Extremely Fast Outlier Detection from a Data Stream via Set-Based Processing , 2019, Proc. VLDB Endow..

[5]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[6]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[7]  Lei Cao,et al.  Scalable Kernel Density Estimation-based Local Outlier Detection over Large Data Streams , 2019, EDBT.

[8]  Ling Shao,et al.  A survey on fall detection: Principles and approaches , 2013, Neurocomputing.

[9]  Hans-Peter Kriegel,et al.  Generalized Outlier Detection with Flexible Kernel Density Estimates , 2014, SDM.

[10]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[11]  Patrick Robertson,et al.  Bayesian recognition of motion related activities with inertial sensors , 2010, UbiComp '10 Adjunct.

[12]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[13]  Hwanjo Yu,et al.  DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams , 2018, KDD.

[14]  Cyrus Shahabi,et al.  Distance-based Outlier Detection in Data Streams , 2016, Proc. VLDB Endow..

[15]  LeckieChristopher,et al.  Fast Memory Efficient Local Outlier Detection in Data Streams , 2016 .

[16]  Mahsa Salehi,et al.  Fast Memory Efficient Local Outlier Detection in Data Streams , 2017, IEEE Transactions on Knowledge and Data Engineering.

[17]  He Zhang,et al.  Digital Twin in Industry: State-of-the-Art , 2019, IEEE Transactions on Industrial Informatics.

[18]  Peter Bailis,et al.  Scalable Kernel Density Classification via Threshold-Based Pruning , 2017, SIGMOD Conference.

[19]  M. Hazelton Variable kernel density estimation , 2003 .

[20]  Byung Suk Lee,et al.  Continuous Detection of Abnormal Heartbeats from ECG Using Online Outlier Detection , 2018, SIMBig.

[21]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[22]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[23]  F. J. G. Gisbert Weighted samples, kernel density estimators and convergence , 2003 .