Semi-supervised clustering with pairwise and size constraints

In recent years, semi-supervised clustering receives considerable attention in the pattern recognition and data mining communities. This type of clustering algorithms takes advantage of partial prior knowledge, and significant improved performance beyond traditional unsupervised clustering algorithms is observed. In general, the partial prior knowledge is mainly in the form of pairwise constraints, which specify whether point pairs should be in the same cluster or in different clusters. Moreover, some other forms of constraints also attract research interests, for example, the balance constraint or the size constraint. However, it is also important to consider different types of constraints simultaneously, since different types of prior knowledge might have their own bias when considered separately. In this paper, we propose an improved algorithm to incorporate the pairwise and size constraints into a unified framework. Experiments on several benchmark data sets demonstrate that the proposed unified algorithm outperforms previous approaches under a variety of different conditions, which demonstrates that judicious integration of different types of constraints can result in improved performance than in those cases where only a single kind of constraint is used.

[1]  Yi Liu,et al.  BoostCluster: boosting clustering by pairwise constraints , 2007, KDD '07.

[2]  S. S. Ravi,et al.  Identifying and Generating Easy Sets of Constraints for Clustering , 2006, AAAI.

[3]  M. Narasimha Murty,et al.  A stochastic connectionist approach for global optimization with application to pattern clustering , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[4]  Joydeep Ghosh,et al.  Scalable Clustering Algorithms with Balancing Constraints , 2006, Data Mining and Knowledge Discovery.

[5]  Joydeep Ghosh,et al.  Model-based clustering with soft balancing , 2003, Third IEEE International Conference on Data Mining.

[6]  Hui Xiong,et al.  K-means clustering versus validation measures: a data distribution perspective , 2006, KDD '06.

[7]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[8]  Joydeep Ghosh,et al.  Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres , 2004, IEEE Transactions on Neural Networks.

[9]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[10]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[11]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[14]  Yi Hong,et al.  Learning Assignment Order of Instances for the Constrained K-Means Clustering Algorithm , 2009, IEEE Trans. Syst. Man Cybern. Part B.

[15]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[16]  Ian Davidson,et al.  Measuring Constraint-Set Utility for Partitional Clustering Algorithms , 2006, PKDD.

[17]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..

[18]  Joydeep Ghosh,et al.  Scalable, Balanced Model-based Clustering , 2003, SDM.

[19]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[22]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[23]  Hau-San Wong,et al.  Partial closure-based constrained clustering with order ranking , 2008, 2008 19th International Conference on Pattern Recognition.

[24]  Frank Klawonn,et al.  Clustering with Size Constraints , 2008, Computational Intelligence Paradigms.

[25]  Nozha Boujemaa,et al.  Active semi-supervised fuzzy clustering , 2008, Pattern Recognit..

[26]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ian Davidson,et al.  When Is Constrained Clustering Beneficial, and Why? , 2006, AAAI.