论文信息 - Identifying and Generating Easy Sets of Constraints for Clustering

Identifying and Generating Easy Sets of Constraints for Clustering

Clustering under constraints is a recent innovation in the artificial intelligence community that has yielded significant practical benefit. However, recent work has shown that for some negative forms of constraints the associated subproblem of just finding a feasible clustering is NP-complete. These worst case results for the entire problem class say nothing of where and how prevalent easy problem instances are. In this work, we show that there are large pockets within these problem classes where clustering under constraints is easy and that using easy sets of constraints yields better empirical results. We then illustrate several sufficient conditions from graph theory to identify a priori where these easy problem instances are and present algorithms to create large and easy to satisfy constraint sets.

S. S. Ravi | Ian Davidson | I. Davidson | S. Ravi

[1] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[2] Arindam Banerjee,et al. Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[3] S. S. Ravi,et al. Clustering with Constraints: Feasibility Issues and the k-Means Algorithm , 2005, SDM.

[4] Pat Langley,et al. Editorial: On Machine Learning , 1986, Machine Learning.

[5] Raymond J. Mooney,et al. Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[6] Sandy Irani. Coloring inductive graphs on-line , 2005, Algorithmica.

[7] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .