A Fast Clustering Method for Real-Time IoT Data Streams

As an effective way of data analysis, clustering is widely applied in the IoT based applications. By studying the related existing proposals of data clustering, a new clustering method for IoT Data streams is proposed in the present work. Firstly, the characteristics of PML documents in the process of data acquisition and identification are introduced and a hybrid PML document similarity calculation method based on the Bayesian network is developed and expected to assist in data streams clustering. Secondly, a PML data streams clustering method based on a dynamic sliding window is proposed. Finally, we evaluate the performance of our clustering method and the related methods with respect to Running time, Similarity, Purity, Entropy, and F-measure. Experimental results exhibit that the innovative clustering approach can adaptively learn from data streams that change over time, while still maintains comparable accuracy and speed.

[1]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[2]  Shengxiang Yang,et al.  Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams , 2019, IEEE Transactions on Cybernetics.

[3]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[4]  Najoua Essoukri Ben Amara,et al.  A new similarity measure based on Bayesian Network signature correspondence for brain tumors cases retrieval , 2014, Int. J. Comput. Intell. Syst..

[5]  Schubert Foo,et al.  Social Information Retrieval Systems: Emerging Technologies and Applications for Searching the Web Effectively , 2007 .

[6]  Amit Kumar,et al.  XML stream processing using tree-edit distance embeddings , 2005, TODS.

[7]  Feng Gao,et al.  A Fast Clustering Method for Identifying Rock Discontinuity Sets , 2019, KSCE Journal of Civil Engineering.

[8]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[9]  Richi Nayak,et al.  Fast and effective clustering of XML data using structural information , 2008, Knowledge and Information Systems.

[10]  Zhen Liu,et al.  Sparse Self-Represented Network Map: A fast representative-based clustering method for large dataset and data stream , 2018, Eng. Appl. Artif. Intell..

[11]  Laurence T. Yang,et al.  An Incremental CFS Algorithm for Clustering Large Data in Industrial Internet of Things , 2017, IEEE Transactions on Industrial Informatics.

[12]  Huansheng Ning,et al.  An improved clustering algorithm and its application in IoT data analysis , 2019, Comput. Networks.