With1 the rapid development of information society, data streams have become the main data model in many fields. In order to dig out the useful information contained in data, these data stream clustering algorithms are particularly important. There are two key issues in the process of handing data stream with data stream clustering algorithm: On the one hand, it is how to judge outliers; on the other hand, it is how to eliminate outdated data in time. Aiming at these two problems, this paper proposes a DCluStream algorithm. The algorithm mainly is designed a set of buffer processing mechanism to deal with abnormal data in order to correctly judge whether these abnormal data are outliers. In addition, the DCluStream algorithm is added the decay time window in the stage of the online micro-clustering, and each data is assigned weight value. Through observing real-time weight of micro cluster for each micro cluster, the algorithm eliminates these overdue micro clusters in time and better deals with recent data in order to realize the accurate clustering. Finally, the DClustream algorithm uses KDD CUP99 data set for simulation experiments. These experimental results show that the new algorithm improves the clustering quality and reduces the clustering processing time, as well as it cuts down memory occupancy.
[1]
Tianrui Li,et al.
Hyper-ellipsoidal clustering technique for evolving data stream
,
2014,
Knowl. Based Syst..
[2]
Sudipto Guha,et al.
Streaming-data algorithms for high-quality clustering
,
2002,
Proceedings 18th International Conference on Data Engineering.
[3]
Philip S. Yu,et al.
A Framework for Clustering Evolving Data Streams
,
2003,
VLDB.
[4]
Yin Jian,et al.
Arbitrary Shape Cluster Algorithm for Clustering Data Stream
,
2006
.
[5]
Shen Qing-ni.
Micro-cluster-based online network abnormal detection method
,
2013
.
[6]
Jiawei Han,et al.
Data Mining: Concepts and Techniques, Second Edition
,
2006,
The Morgan Kaufmann series in data management systems.
[7]
Chen Jinyin,et al.
Density-Based Heterogeneous Data Stream Clustering Algorithm with Mixed Distance Measure Methods
,
2015
.
[8]
Jian-Long Chang.
Clustering Evolving Data Streams over Sliding Windows
,
2007
.