Semi-supervised clustering with metric learning using relative comparisons

Semisupervised clustering algorithms partition a given data set using limited supervision from the user. The success of these algorithms depends on the type of supervision and also on the kind of dissimilarity measure used while creating partitions of the space. This paper proposes a clustering algorithm that uses supervision in terms of relative comparisons, viz., x is closer to y than to z. The proposed clustering algorithm simultaneously learns the underlying dissimilarity measure while finding compact clusters in the given data set using relative comparisons. Through our experimental studies on high-dimensional textual data sets, we demonstrate that the proposed algorithm achieves higher accuracy and is more robust than similar algorithms using pairwise constraints for supervision.

[1]  Hichem Frigui,et al.  Simultaneous categorization of text documents and identification of cluster-dependent keywords , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[2]  Thomas S. Huang,et al.  Relevance feedback techniques in interactive content-based image retrieval , 1997, Electronic Imaging.

[3]  Rakesh Agrawal,et al.  Learning spatially variant dissimilarity (SVaD) measures , 2004, KDD '04.

[4]  James C. Bezdek,et al.  Some Notes on Alternating Optimization , 2002, AFSS.

[5]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[6]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[7]  Andrew McCallum,et al.  Semi-Supervised Clustering with User Feedback , 2003 .

[8]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[11]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[12]  Rakesh Agrawal,et al.  On learning asymmetric dissimilarity measures , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[14]  Stephen M. Smith,et al.  Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm , 2001, IEEE Transactions on Medical Imaging.

[15]  Witold Pedrycz,et al.  Fuzzy clustering with partial supervision , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Krishna Kummamuru,et al.  Semisupervised Clustering with Metric Learning using Relative Comparisons , 2008, IEEE Trans. Knowl. Data Eng..