An Anomaly Detection Algorithm Selection Service for IoT Stream Data Based on Tsfresh Tool and Genetic Algorithm

Anomaly detection algorithms (ADA) have been widely used as services in many maintenance monitoring platforms. However, there are numerous algorithms that could be applied to these fast changing stream data. Furthermore, in IoT stream data due to its dynamic nature, the phenomena of conception drift happened. Therefore, it is a challenging task to choose a suitable anomaly detection service (ADS) in real time. For accurate online anomalous data detection, this paper developed a service selection method to select and configure ADS at run-time. Initially, a time-series feature extractor (Tsfresh) and a genetic algorithm-based feature selection method are applied to swiftly extract dominant features which act as representation for the stream data patterns. Additionally, stream data and various efficient algorithms are collected as our historical data. A fast classification model based on XGBoost is trained to record stream data features to detect appropriate ADS dynamically at run-time. These methods help to choose suitable service and their respective configuration based on the patterns of stream data. The features used to describe and reflect time-series data’s intrinsic characteristics are the main success factor in our framework. Consequently, experiments are conducted to evaluate the effectiveness of features closed by genetic algorithm. Experimentations on both artificial and real datasets demonstrate that the accuracy of our proposed method outperforms various advanced approaches and can choose appropriate service in different scenarios efficiently.

[1]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[2]  Shenglin Zhang,et al.  Rapid Deployment of Anomaly Detection Models for Large Number of Emerging KPI Streams , 2018, 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC).

[3]  Chao Yi,et al.  Time-Series Anomaly Detection Service at Microsoft , 2019, KDD.

[4]  Andreas W. Kempa-Liehr,et al.  Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python package) , 2018, Neurocomputing.

[5]  Nenad Stojanovic,et al.  Big-data-driven anomaly detection in industry (4.0): An approach and a case study , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[6]  Lovekesh Vig,et al.  LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection , 2016, ArXiv.

[7]  Sebastian Wagner,et al.  Anomaly Detection in Univariate Time-series: A Survey on the State-of-the-Art , 2020, ArXiv.

[8]  Yanbo Han,et al.  A Data-Driven Service Creation Approach for Effectively Capturing Events from Multiple Sensor Streams , 2019, 2019 IEEE International Conference on Web Services (ICWS).

[9]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[10]  Subutai Ahmad,et al.  Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[11]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[12]  Qiming Chen,et al.  Experience in Continuous analytics as a Service (CaaaS) , 2011, EDBT/ICDT '11.

[13]  Saeed Amizadeh,et al.  Generic and Scalable Framework for Automated Time-series Anomaly Detection , 2015, KDD.

[14]  Chen Liu,et al.  A Service Selection Framework for Anomaly Detection in IoT Stream Data , 2020, 2020 International Conference on Service Science (ICSS).

[15]  Vipin Kumar,et al.  Comparative Evaluation of Anomaly Detection Techniques for Sequence Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[16]  J. Ma,et al.  Time-series novelty detection using one-class support vector machines , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[17]  Jehn-Ruey Jiang,et al.  Anomaly Detection for Univariate Time Series with Statistics and Deep Learning , 2019, 2019 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE).

[18]  Ali Kashif Bashir,et al.  Data mining and machine learning methods for sustainable smart cities traffic classification: A survey , 2020, Sustainable Cities and Society.

[19]  Takehisa Yairi,et al.  Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction , 2014, MLSDA'14.