Active semi-supervised fuzzy clustering

Clustering algorithms are increasingly employed for the categorization of image databases, in order to provide users with database overviews and make their access more effective. By including information provided by the user, the categorization process can produce results that come closer to user's expectations. To make such a semi-supervised categorization approach acceptable for the user, this information must be of a very simple nature and the amount of information the user is required to provide must be minimized. We propose here an effective semi-supervised clustering algorithm, active fuzzy constrained clustering (AFCC), that minimizes a competitive agglomeration cost function with fuzzy terms corresponding to pairwise constraints provided by the user. In order to minimize the amount of constraints required, we define an active mechanism for the selection of candidate constraints. The comparisons performed on a simple benchmark and on a ground truth image database show that with AFCC the results of clustering can be significantly improved with few constraints, making this semi-supervised approach an attractive alternative in the categorization of image databases.

[1]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[2]  Hichem Frigui,et al.  Clustering by competitive agglomeration , 1997, Pattern Recognit..

[3]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[4]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[5]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[6]  W. Peizhuang Pattern Recognition with Fuzzy Objective Function Algorithms (James C. Bezdek) , 1983 .

[7]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Joachim M. Buhmann,et al.  Active Data Clustering , 1997, NIPS.

[11]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[12]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[13]  Ayhan Demiriz,et al.  Semi-Supervised Clustering Using Genetic Algorithms , 1999 .

[14]  Nozha Boujemaa,et al.  Semi-Supervised Fuzzy Clustering with Pairwise-Constrained Competitive Agglomeration , 2005, The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05..

[15]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[16]  Andrew McCallum,et al.  Semi-Supervised Clustering with User Feedback , 2003 .