Clustering with Partition Level Side Information

Constrained clustering uses pre-given knowledge to improve the clustering performance. Among existing literature, researchers usually focus on Must-Link and Cannot-Link pairwise constraints. However, pairwise constraints not only disobey the way we make decisions, but also suffer from the vulnerability of noisy constraints and the order of constraints. In light of this, we use partition level side information instead of pairwise constraints to guide the process of clustering. Compared with pairwise constraints, partition level side information keeps the consistency within partial structure and avoids self-contradictory and the impact of constraints order. Generally speaking, only small part of the data instances are given labels by human workers, which are used to supervise the procedure of clustering. Inspired by the success of ensemble clustering, we aim to find a clustering solution which captures the intrinsic structure from the data itself, and agrees with the partition level side information as much as possible. Then we derive the objective function and equivalently transfer it into a K-mean-like optimization problem. Extensive experiments on several real-world datasets demonstrate the effectiveness and efficiency of our method compared to pairwise constrained clustering and consensus clustering, which verifies the superiority of partition level side information to pairwise constraints. Besides, our method has high robustness to noisy side information.

[1]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[2]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[3]  Yun Fu,et al.  Block-wise constrained sparse graph for face image representation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[4]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[5]  Hui Xiong,et al.  A Theoretic Framework of K-Means-Based Consensus Clustering , 2013, IJCAI.

[6]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hui Xiong,et al.  K-Means-Based Consensus Clustering: A Unified View , 2015, IEEE Transactions on Knowledge and Data Engineering.

[8]  Yun Fu,et al.  Learning Balanced and Unbalanced Graphs via Low-Rank Coding , 2015, IEEE Transactions on Knowledge and Data Engineering.

[9]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[10]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[11]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[12]  Sugato Basu Semi-supervised Clustering: Learning with Limited User Feedback , 2004 .

[13]  Ming Shao,et al.  Deep Linear Coding for Fast Graph Clustering , 2015, IJCAI.

[14]  Dan Pelleg,et al.  K -Means with Large and Noisy Constraint Sets , 2007, ECML.

[15]  Hui Xiong,et al.  Adapting the right measures for K-means clustering , 2009, KDD.

[16]  Junjie Wu,et al.  Spectral Ensemble Clustering , 2015, KDD.

[17]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[18]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[20]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[21]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[22]  Jinfeng Yi,et al.  Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning , 2012, NIPS.

[23]  S. S. Ravi,et al.  Clustering with Constraints: Feasibility Issues and the k-Means Algorithm , 2005, SDM.

[24]  Junjie Wu,et al.  DIAS: A Disassemble-Assemble Framework for Highly Sparse Text Clustering , 2015, SDM.

[25]  Joydeep Ghosh,et al.  A study of K-Means-based algorithms for constrained clustering , 2013, Intell. Data Anal..

[26]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.