Semi-Supervised Stream Clustering Using Labeled Data Points

Semi-supervised stream clustering performs cluster analysis of data streams by exploiting background or domain expert knowledge. Almost of existing semi-supervised stream clustering techniques exploit background knowledge as constraints such as must-link and cannot-link constraints. The use of constraints is not appropriate with respect to the dynamic nature of data streams. In this paper, we proposed a new semi-supervised stream clustering algorithm, SSE-Stream. SSE-Stream exploits background knowledge in the form of single labeled data points to monitor and detect change of the clustering structure evolution. Exploiting background knowledge as single labeled data points is more appropriate for data streams. They can be immediately utilised for determining the class of clusters, and effectively support the changing behavior of data streams. SSE-Stream defines new cluster representation to include labeled data points, and uses it to extend the clustering operations such as merge and split for detecting change of the clustering structure evolution. Experimental results on real-world stream datasets show that SSE-Stream is able to improve the output clustering quality, especially for highly complex and drift datasets.