Towards Efficient Data Sampling for Temporal Anomaly Detection in Sensor Networks

Data streams sampling is intended to build a sample on which the future data analysis tasks will be performed. Several parameters affect the effectiveness of the built sample: the used sampling algorithm, chosen sampling rate, and window size if the sliding window model is adopted. Thus, given a stream of items, the most challenging task is to select the most relevant sampling technique to apply and the right parameters to employ to sample the data. In this paper, we address the impact of data sampling on the anomaly detection results. First, we develop a new version of the Weighted Random Sampling (WRS) algorithm that samples the data based on their values with respect to the values of their neighbors in the current sliding window. Thereafter, we study the impact of the sampling process on the anomalies detection using the Exponential Weighted Moving Average (EWMA) algorithm. In this context, the comparison of the sampling algorithms is based on their response time in case of anomaly and the relevance of the detected anomalies.

[1]  William H. Woodall,et al.  Another Look at the EWMA Control Chart with Estimated Parameters , 2015 .

[2]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[3]  Zakia Kazi-Aoul,et al.  A performance study of the chain sampling algorithm , 2015, 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS).

[4]  JongWon Kim,et al.  Suspicious traffic sampling for intrusion detection in software-defined networks , 2016, Comput. Networks.

[5]  Muhammad Sher,et al.  Flow-based intrusion detection: Techniques and challenges , 2017, Comput. Secur..

[6]  Aloysius Edoh,et al.  A Statistical Approach Based on EWMA and CUSUM Control Charts for R2L Intrusion Detection , 2017, 2017 Cybersecurity and Cyberforensics Conference (CCC).

[7]  Qin Yu,et al.  An Improved ARIMA-Based Traffic Anomaly Detection Algorithm for Wireless Sensor Networks , 2016, Int. J. Distributed Sens. Networks.

[8]  Jacques Demerjian,et al.  A performance evaluation of data streams sampling algorithms over a sliding window , 2018, 2018 IEEE Middle East and North Africa Communications Conference (MENACOMM).

[9]  Fei Hu,et al.  Detection of Faults and Attacks Including False Data Injection Attack in Smart Grid Using Kalman Filter , 2014, IEEE Transactions on Control of Network Systems.

[10]  Ronald J. M. M. Does,et al.  A Robust Phase I Exponentially Weighted Moving Average Chart for Dispersion , 2015, Qual. Reliab. Eng. Int..

[11]  James M. Lucas,et al.  Exponentially weighted moving average control schemes: Properties and enhancements , 1990 .

[12]  Wenli Zhou,et al.  Sampling Method in Traffic Logs Analyzing , 2016, 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC).

[13]  Philippe Owezarski,et al.  Evaluating the Impact of Traffic Sampling on AATAC's DDoS Detection , 2018, WTMC@SIGCOMM.

[14]  Jacques Demerjian,et al.  Sampling algorithms in data stream environments , 2016, 2016 International Conference on Digital Economy (ICDEc).

[15]  Talel Abdessalem,et al.  Traitement de données de consommation électrique par un Système de Gestion de Flux de Données , 2007, EGC.

[16]  R. Tsay,et al.  Outlier Detection in Multivariate Time Series by Projection Pursuit , 2006 .

[17]  Subutai Ahmad,et al.  Unsupervised real-time anomaly detection for streaming data , 2017, Neurocomputing.

[18]  Feng Zhao,et al.  Distributed Group Management for Track Initiation and Maintenance in Target Localization Applications , 2003, IPSN.

[19]  Ronald J. M. M. Does,et al.  A Robust Estimator for Location in Phase I Based on an EWMA Chart , 2014 .

[20]  Lior Rokach,et al.  Sampling High Throughput Data for Anomaly Detection of Data-Base Activity , 2017, ArXiv.

[21]  D.J. Leith,et al.  Adaptive Kalman Filtering for anomaly detection in software appliances , 2008, IEEE INFOCOM Workshops 2008.

[22]  S. W. Roberts,et al.  Control Chart Tests Based on Geometric Moving Averages , 2000, Technometrics.

[23]  Hui Zang,et al.  Is sampled data sufficient for anomaly detection? , 2006, IMC '06.

[24]  Levent Gürgen Gestion à grande échelle de données de capteurs hétérogènes , 2007 .

[25]  Xiaohua Jia,et al.  The Impact of Sampling on Big Data Analysis of Social Media: A Case Study on Flu and Ebola , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[26]  Kavé Salamatian,et al.  Combining filtering and statistical methods for anomaly detection , 2005, IMC '05.