Discriminative Semi-Supervised Clustering Analysis with Pairwise Constraints

Most existing semi-supervised clustering algorithms with pairwise constraints neither solve the problem of violation of pairwise constraints effectively, nor handle the high-dimensional data simultaneously. This paper presents a discriminative semi-supervised clustering analysis algorithm with pairwise constraints, called DSCA, which effectively utilizes supervised information to integrate dimensionality reduction and clustering. The proposed algorithm projects the data onto a low-dimensional manifold, where pairwise constraints based K-means algorithm is simultaneously used to cluster the data. Meanwhile, pairwise constraints based K-means algorithm presented in this paper reduces the computational complexity of constraints based semi-supervised algorithm and resolve the problem of violating pairwise constraints in the existing semi-supervised clustering algorithms. Experimental results on real-world datasets demonstrate that the proposed algorithm can effectively deal with high-dimensional data and provide an appealing clustering performance compared with the state-of-the-art semi-supervised algorithm.

[1]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[2]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.

[3]  Carlotta Domeniconi,et al.  An Adaptive Kernel Method for Semi-supervised Clustering , 2006, ECML.

[4]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[5]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[6]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[7]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[8]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[9]  Hong Chang,et al.  Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints , 2006, Pattern Recognit..

[10]  Daoqiang Zhang,et al.  Semi-Supervised Dimensionality Reduction ∗ , 2007 .

[11]  Jieping Ye,et al.  Adaptive Distance Metric Learning for Clustering , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Hui Xiong,et al.  Enhancing semi-supervised clustering: a feature projection perspective , 2007, KDD '07.

[13]  Jieping Ye,et al.  Nonlinear adaptive distance metric learning for clustering , 2007, KDD '07.

[14]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[15]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[16]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[17]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..