Research on Data Stream Clustering Algorithm Based on Decay Time Window

With1 the rapid development of information society, data streams have become the main data model in many fields. In order to dig out the useful information contained in data, these data stream clustering algorithms are particularly important. There are two key issues in the process of handing data stream with data stream clustering algorithm: On the one hand, it is how to judge outliers; on the other hand, it is how to eliminate outdated data in time. Aiming at these two problems, this paper proposes a DCluStream algorithm. The algorithm mainly is designed a set of buffer processing mechanism to deal with abnormal data in order to correctly judge whether these abnormal data are outliers. In addition, the DCluStream algorithm is added the decay time window in the stage of the online micro-clustering, and each data is assigned weight value. Through observing real-time weight of micro cluster for each micro cluster, the algorithm eliminates these overdue micro clusters in time and better deals with recent data in order to realize the accurate clustering. Finally, the DClustream algorithm uses KDD CUP99 data set for simulation experiments. These experimental results show that the new algorithm improves the clustering quality and reduces the clustering processing time, as well as it cuts down memory occupancy.