TADILOF: Time Aware Density-Based Incremental Local Outlier Detection in Data Streams

Outlier detection in data streams is crucial to successful data mining. However, this task is made increasingly difficult by the enormous growth in the quantity of data generated by the expansion of Internet of Things (IoT). Recent advances in outlier detection based on the density-based local outlier factor (LOF) algorithms do not consider variations in data that change over time. For example, there may appear a new cluster of data points over time in the data stream. Therefore, we present a novel algorithm for streaming data, referred to as time-aware density-based incremental local outlier detection (TADILOF) to overcome this issue. In addition, we have developed a means for estimating the LOF score, termed "approximate LOF," based on historical information following the removal of outdated data. The results of experiments demonstrate that TADILOF outperforms current state-of-the-art methods in terms of AUC while achieving similar performance in terms of execution time. Moreover, we present an application of the proposed scheme to the development of an air-quality monitoring system.

[1]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[2]  Maurizio Filippone,et al.  A comparative evaluation of outlier detection algorithms: Experiments and analyses , 2018, Pattern Recognit..

[3]  Muhammad Attique,et al.  Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods , 2020, Sensors.

[4]  Peng Song,et al.  Scalable KDE-based top-n local outlier detection over large-scale data streams , 2020, Knowl. Based Syst..

[5]  Yu Zheng,et al.  U-Air: when urban air quality inference meets big data , 2013, KDD.

[6]  Barnabás Póczos,et al.  Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions , 2011, UAI.

[7]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[8]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[9]  Sachit Mahajan,et al.  ADF: An Anomaly Detection Framework for Large-Scale PM2.5 Sensing Systems , 2018, IEEE Internet of Things Journal.

[10]  Alfredo Ferro,et al.  Enhancing density-based clustering: Parameter reduction and outlier detection , 2013, Inf. Syst..

[11]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[12]  Mahsa Salehi,et al.  Fast Memory Efficient Local Outlier Detection in Data Streams , 2017, IEEE Transactions on Knowledge and Data Engineering.

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[14]  Lei Cao,et al.  Scalable Top-n Local Outlier Detection , 2017, KDD.

[15]  Lei Cao,et al.  Scalable Kernel Density Estimation-based Local Outlier Detection over Large Data Streams , 2019, EDBT.

[16]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[17]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[18]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[19]  Johan A. K. Suykens,et al.  Incremental kernel spectral clustering for online learning of non-stationary data , 2014, Neurocomputing.

[20]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[21]  Ming Li,et al.  Forecasting Fine-Grained Air Quality Based on Big Data , 2015, KDD.

[22]  Charles R. Farrar,et al.  Machine learning algorithms for damage detection under operational and environmental variability , 2011 .

[23]  Shou-De Lin,et al.  Inferring Air Quality for Station Location Recommendation Based on Urban Big Data , 2015, KDD.

[24]  Sanjay Chakraborty,et al.  Analysis and Study of Incremental K-Means Clustering Algorithm , 2011, Grid 2011.

[25]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[26]  Wei Fan,et al.  Systematic data selection to mine concept-drifting data streams , 2004, KDD.

[27]  Christian S. Jensen,et al.  Outlier Detection for Multidimensional Time Series Using Deep Neural Networks , 2018, 2018 19th IEEE International Conference on Mobile Data Management (MDM).

[28]  Hans-Peter Kriegel,et al.  Interpreting and Unifying Outlier Scores , 2011, SDM.

[29]  Hwanjo Yu,et al.  DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams , 2018, KDD.

[30]  Abdennaceur Kachouri,et al.  Outlier detection for wireless sensor networks using density-based clustering approach , 2017, IET Wirel. Sens. Syst..

[31]  Jen-Wei Huang,et al.  Adaptive Deep Learning-Based Air Quality Prediction Model Using the Most Relevant Spatial-Temporal Relations , 2018, IEEE Access.