论文信息 - Interactive constrained clustering for patent document set

Interactive constrained clustering for patent document set

Constrained clustering is attracting attention as a useful way for grouping a data set into intended clusters using users' feedback. We develop an interactive document clustering method by employing constrained clustering in order to group patent documents into some technological categories based on their contents. This method aims to progressively improve the accuracy of clustering to repeat both the assigning of appropriate cluster to documents and applying constrained clustering. We evaluate how many documents our method needs in order to reach an adequate accuracy and which document should be given to accomplish the desired result in fewer assignments. Moreover, by repeating both the assigning and clustering, it comes to the point at which the clustering accuracy is improved by just only the number of documents given true. We propose an approach to predict such a point based on the amount of cluster label changes in the K-Means loop.

Yusuke Sato | Makoto Iwayama | Makoto Iwayama | Yusuke Sato

[1] Andrew McCallum,et al. Semi-Supervised Clustering with User Feedback , 2003 .

[2] Arindam Banerjee,et al. Semi-supervised Clustering by Seeding , 2002, ICML.

[3] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[4] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .