A study on semi-supervised FCM algorithm

Most variants of fuzzy c-means (FCM) clustering algorithms involving prior knowledge are generally based on the modification of the objective function or the clustering process. This paper proposes a new weighted semi-supervised FCM algorithm (SSFCM-HPR) that transforms the prior knowledge in the labeled samples into constraint conditions in terms of fuzzy membership degrees, assigns different weights according to the representativeness of the samples, and then uses the HPR multiplier to solve the clustering problem. The “representativeness” of the labeled samples is decided by their distances to the cluster centers they belong to. In this paper, we take the ratio of the largest to the second largest fuzzy membership degree from a labeled sample as its weight. This algorithm not only retains the fuzzy partition of the labeled samples, which guarantees the effective guidance on the clustering process, but also can detect whether a sample is an outlier or not. Moreover, when part of the supervised information of the labeled samples is wrong, this algorithm can reduce the influence of the incorrectly labeled samples on the final clustering results. The experimental evaluation on synthetic and real data sets demonstrates the efficiency and effectiveness of our approach.

[1]  Witold Pedrycz,et al.  Neural-network front ends in unsupervised learning , 1997, IEEE Trans. Neural Networks.

[2]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[3]  Witold Pedrycz,et al.  Fuzzy clustering with supervision , 2004, Pattern Recognit..

[4]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[5]  Daoqiang Zhang,et al.  Semi-Supervised Dimensionality Reduction ∗ , 2007 .

[6]  Nozha Boujemaa,et al.  Active semi-supervised fuzzy clustering , 2008, Pattern Recognit..

[7]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[8]  Witold Pedrycz,et al.  Algorithms of fuzzy clustering with partial supervision , 1985, Pattern Recognit. Lett..

[9]  Witold Pedrycz,et al.  Enhancement of fuzzy clustering by mechanisms of partial supervision , 2006, Fuzzy Sets Syst..

[10]  Thomas A. Runkler,et al.  Classification and prediction of road traffic using application-specific fuzzy clustering , 2002, IEEE Trans. Fuzzy Syst..

[11]  Chitta Baral,et al.  Fuzzy C-means Clustering with Prior Biological Knowledge , 2022 .

[12]  Lizhen Liu,et al.  A high-performing comprehensive learning algorithm for text classification without pre-labeled training set , 2011, Knowledge and Information Systems.

[13]  Hui Xiong,et al.  Enhancing semi-supervised clustering: a feature projection perspective , 2007, KDD '07.

[14]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[15]  Witold Pedrycz,et al.  Fuzzy clustering with partial supervision , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Witold Pedrycz,et al.  Fuzzy Clustering With Partial Supervision in Organization and Classification of Digital Images , 2008, IEEE Transactions on Fuzzy Systems.

[17]  Jing Peng,et al.  Composite kernels for semi-supervised clustering , 2011, Knowledge and Information Systems.

[18]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[19]  Witold Pedrycz,et al.  Data Clustering with Partial Supervision , 2005, Data Mining and Knowledge Discovery.

[20]  Carlotta Domeniconi,et al.  An Adaptive Kernel Method for Semi-supervised Clustering , 2006, ECML.

[21]  Qing He,et al.  Effective semi-supervised document clustering via active learning with instance-level constraints , 2011, Knowledge and Information Systems.

[22]  Hong Chang,et al.  Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints , 2006, Pattern Recognit..

[23]  Nikos Pelekis,et al.  Clustering uncertain trajectories , 2011, Knowledge and Information Systems.

[24]  Mohammed Al-Shalalfa,et al.  Fuzzy clustering-based discretization for gene expression classification , 2010, Knowledge and Information Systems.

[25]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[26]  Lequan Min,et al.  Novel modified fuzzy c-means algorithm with applications , 2009, Digit. Signal Process..

[27]  Witold Pedrycz,et al.  COLLABORATIVE AND KNOWLEDGE-BASED FUZZY CLUSTERING , 2007 .

[28]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[29]  Witold Pedrycz,et al.  Knowledge-based clustering - from data to information granules , 2007 .

[30]  M. Benkhalifa,et al.  Text categorization using the semi-supervised fuzzy c-means algorithm , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[31]  Jing Lu,et al.  Semi-supervised fuzzy clustering: A kernel-based approach , 2009, Knowl. Based Syst..

[32]  James C. Bezdek,et al.  Partially supervised clustering for image segmentation , 1996, Pattern Recognit..

[33]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[34]  Sadaaki Miyamoto,et al.  Some Pairwise Constrained Semi-Supervised Fuzzy c-Means Clustering Algorithms , 2009, MDAI.