Anomaly detection method for sensor network data streams based on sliding window sampling and optimized clustering

Abstract When detecting abnormal data in the sensor network data stream, it is necessary to accurately obtain the source of the abnormal data. The traditional data stream clustering algorithm has the disadvantages of large clustering information loss and low accuracy. Therefore, this paper proposes a sensor network data stream anomaly detection method based on optimized clustering. Firstly, the proposed sampling algorithm is used to sample the data stream. The sampling result is used as a sample set. Use dynamic data histogram to divide the data dimension into different dimension groups, calculate the maximum entropy division dimension space cluster of each dimension, and aggregate the data of the same dimension cluster into the micro cluster. The abnormality detection of the data stream is realized by comparing the information entropy size of the micro cluster and its distribution characteristics. The experimental results show that the proposed algorithm can improve the accuracy and effectiveness of data stream anomaly detection.

[1]  G. Dupont,et al.  Friezes and a construction of the euclidean cluster variables , 2010, 1003.0197.

[2]  Zi-wen Li,et al.  Density grid-based data stream clustering algorithm over sliding window: Density grid-based data stream clustering algorithm over sliding window , 2010 .

[3]  Giandomenico Spezzano,et al.  A single pass algorithm for clustering evolving data streams based on swarm intelligence , 2011, Data Mining and Knowledge Discovery.

[4]  Verónica Bolón-Canedo,et al.  Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset , 2011, Expert Syst. Appl..

[5]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[6]  U. Rajendra Acharya,et al.  Characterization of focal EEG signals: A review , 2019, Future Gener. Comput. Syst..

[7]  Mark D. Robinson,et al.  Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data , 2016, bioRxiv.

[8]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[9]  Marc Bocquet Reconstruction of an atmospheric tracer source using the principle of maximum entropy. II: Applications , 2005 .

[10]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[11]  Jian-Long Chang Clustering Evolving Data Streams over Sliding Windows , 2007 .

[12]  Jiadong Ren,et al.  A High Dimensional Data Stream Clustering Algorithm Based on Fractal and Grid , 2012 .

[13]  Tetsuya Takine,et al.  Design of a sliding window scheme for detecting high packet-rate flows via random packet sampling , 2011, Comput. Networks.

[14]  Sudipto Guha,et al.  Correlation Clustering in Data Streams , 2015, ICML.

[15]  Kan Li,et al.  Clustering Evolving Data Stream with Affinity Propagation Algorithm , 2014, DEXA.

[16]  Zhanhuai Li,et al.  A priority random sampling algorithm for time-based sliding windows over weighted streaming data , 2007, SAC '07.

[17]  Rafail Ostrovsky,et al.  Optimal sampling from sliding windows , 2012, J. Comput. Syst. Sci..