RADAR: Reactive Concept Drift Management with Robust Variational Inference for Evolving IoT Data Streams

The accuracy and performance of Machine Learning (ML) models can gradually or even suddenly degrade when the underlying statistical distribution of data streams changes over time; this is known as concept drift. This phenomenon could adversely affect the IoT data management and analysis landscape that relies intensely on data-driven cognitive technologies. Therefore, concept drift should be detected immediately, which is challenging due to the increasing number of dimensional features and lack of ground truth. Its adaptive countermeasures also become difficult to design when data streams are being generated frequently and require latency-sensitive responses. The uncertainty and time dependencies characteristics of IoT data streams further intensify the complexity of concept drift management. This work proposes a reactive drift management framework named RADAR for streaming IoT applications that can simultaneously detect and react to concept drift using two novel methods: temporal discrepancy measure, and intensity-aware analyser. Collectively, these methods help to determine the adaptation decision to ensure reliable performance, thereby limiting the scope of the frequent ML model update. Experiments conducted using synthetic and real-world setups comprising end-to-end systems demonstrate that RADAR outperforms other benchmarks in achieving better improvement of the performance with the best F-score of 0.86, and obtaining efficient runtime with large data streams.

[1]  Z. Tari,et al.  USMD: UnSupervised Misbehaviour Detection for Multi-Sensor Data , 2023, IEEE Transactions on Dependable and Secure Computing.

[2]  Nathalie Japkowicz,et al.  WATCH: Wasserstein Change Point Detection for High-Dimensional Time Series Data , 2021, 2021 IEEE International Conference on Big Data (Big Data).

[3]  Jie Lu,et al.  Learning Data Streams With Changing Distributions and Temporal Dependency , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Yixuan Li,et al.  Energy-based Out-of-distribution Detection , 2020, NeurIPS.

[5]  Sylvio Barbon Junior,et al.  Evaluation Goals for Online Process Mining: A Concept Drift Perspective , 2020, IEEE Transactions on Services Computing.

[6]  Gustavo E. A. P. A. Batista,et al.  Challenges in benchmarking stream learning algorithms with real-world data , 2020, Data Mining and Knowledge Discovery.

[7]  Dinh C. Nguyen,et al.  Enabling AI in Future Wireless Networks: A Data Life Cycle Perspective , 2020, IEEE Communications Surveys & Tutorials.

[8]  Guangquan Zhang,et al.  Learning under Concept Drift: A Review , 2019, IEEE Transactions on Knowledge and Data Engineering.

[9]  Jiabao Zhao,et al.  Online and Unsupervised Anomaly Detection for Streaming Data Using an Array of Sliding Windows and PDDs , 2019, IEEE Transactions on Cybernetics.

[10]  Hamed R. Bonab,et al.  Unsupervised Concept Drift Detection with a Discriminative Classifier , 2019, CIKM.

[11]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Wei Sun,et al.  Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network , 2019, KDD.

[13]  Xin Huang,et al.  Robust and Rapid Adaption for Concept Drift in Software System Anomaly Detection , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[14]  Geoffrey I. Webb,et al.  Survey of distance measures for quantifying concept drift and shift in numeric data , 2018, Knowledge and Information Systems.

[15]  Valentino Constantinou,et al.  Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding , 2018, KDD.

[16]  Juan José Rodríguez Diez,et al.  Combining univariate approaches for ensemble change detection in multivariate data , 2018, Inf. Fusion.

[17]  Subutai Ahmad,et al.  Unsupervised real-time anomaly detection for streaming data , 2017, Neurocomputing.

[18]  Alexandre Termier,et al.  Anomaly Detection in Streams with Extreme Value Theory , 2017, KDD.

[19]  Heiko Wersing,et al.  KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[20]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[21]  Søren Kaae Sønderby,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[22]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  A. Bifet,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[25]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[26]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[27]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[28]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[29]  Shankar Vembu,et al.  Chemical gas sensor drift compensation using classifier ensembles , 2012 .

[30]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[31]  Sanjay Ranka,et al.  Statistical change detection for multi-dimensional data , 2007, KDD '07.

[32]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[33]  D. Hinkley Inference about the change-point from cumulative sum tests , 1971 .

[34]  Edouard Fouché,et al.  Scalable Online Change Detection for High-dimensional Data Streams , 2022, ArXiv.

[35]  Zahir Tari,et al.  TON_IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems , 2020, IEEE Access.

[36]  Jie Lu,et al.  Regional Concept Drift Detection and Density Synchronized Drift Adaptation , 2017, IJCAI.

[37]  Xiangliang Zhang,et al.  A PCA-Based Change Detection Framework for Multidimensional Data Streams KAUST Repository , 2015 .

[38]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[39]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.