Interactive constrained clustering for patent document set

Constrained clustering is attracting attention as a useful way for grouping a data set into intended clusters using users' feedback. We develop an interactive document clustering method by employing constrained clustering in order to group patent documents into some technological categories based on their contents. This method aims to progressively improve the accuracy of clustering to repeat both the assigning of appropriate cluster to documents and applying constrained clustering. We evaluate how many documents our method needs in order to reach an adequate accuracy and which document should be given to accomplish the desired result in fewer assignments. Moreover, by repeating both the assigning and clustering, it comes to the point at which the clustering accuracy is improved by just only the number of documents given true. We propose an approach to predict such a point based on the amount of cluster label changes in the K-Means loop.