Density-Based Heterogeneous Data Stream Clustering Algorithm with Mixed Distance Measure Methods

Heterogeneous data stream clustering is an important issue in data stream mining, for the accuracy of the existing heterogeneous clustering algorithm is not high, and don’t have a common distance measure method, a heterogeneous data stream clustering algorithm based on the density with mixed distance measure method is proposed. HDSDen algorithm adopts an online/offline two-stage processing framework. According to the situation of dominant property, the online stage use corresponding distance measure method to define the core points among the arriving points, the purpose of the different distance calculation method is to reduce the influence of the non-dominant property on the whole clustering accuracy. All the density-reachable points form a cluster in the offline stage, and put all the not-clustered points into the reservoir, and the number of the reservoir exceeds the threshold value, we will re-cluster the points to improve the accuracy of clustering. Experiments on real data sets show that the algorithm can achieve better clustering results, and give the clustering results at any time, which can deal with the heterogeneous data stream efficiently.