Bayesian Feedback in Data Clustering

In many clustering applications, the user has some vague notion of the number and membership of the desired clusters. However, it is difficult for the user to provide such knowledge explicitly in the clustering process. We propose a solution to circumvent this difficulty by introducing a feedback mechanism. The notion of Bayesian inference for relevance feedback in content-based image retrieval is modified for data clustering. Given the number of clusters, the proposed algorithm seeks information about the target partition by asking the user a sequence of queries about whether a pair of objects should be put in the same cluster or not. Information-theoretic criteria is adopted to select the queries to be presented to the user. The assumption made here is that cluster labels are "smooth", i.e., similar objects should share the same cluster labels. We show that it is possible to obtain reasonable partitions based on the user feedback alone, without the need of specifying a clustering objective function

[1]  Joachim M. Buhmann,et al.  Learning with constrained and unlabelled data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[4]  David J. Miller,et al.  Mixture Modeling with Pairwise, Instance-Level Class Constraints , 2005, Neural Computation.

[6]  Bir Bhanu,et al.  Integrating relevance feedback techniques for image retrieval using reinforcement learning , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jianbo Shi,et al.  Segmentation given partial grouping constraints , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[10]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[11]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Andrew McCallum,et al.  Semi-Supervised Clustering with User Feedback , 2003 .

[13]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, Machine Learning.

[14]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .