Semi-supervised fuzzy clustering with metric learning and entropy regularization

Existing methods for semi-supervised fuzzy c-means (FCMs) suffer from the following issues: (1) the Euclidean distance tends to work poorly if each feature of the instance is unequal variance as well as correlation from others and (2) it is generally uneasy to assign an appropriate value for the parameter m involved in their objective function. To address these problems, we develop a novel semi-supervised metric-based fuzzy clustering algorithm called SMUC by introducing metric learning and entropy regularization simultaneously into the conventional fuzzy clustering algorithm. More specifically, SMUC focuses on learning a Mahalanobis distance metric from side information given by the user to displace the Euclidean distance in FCM-based methods. Thus, it has the same flavor as typical supervised metric algorithms, which makes the distance between instances within a cluster smaller than that between instances belonging to different clusters. Moreover, SMUC introduces maximum entropy as a regularized term in its objective function such that its resulting formulas have the clear physical meaning compared with the other semi-supervised FCM methods. In addition, it naturally avoids the choice on the parameter m due to such a maximum-entropy regularizer. The experiments on real-world data sets show the feasibility and effectiveness of the proposed method with encouraging results.

[1]  Carlotta Domeniconi,et al.  An Adaptive Kernel Method for Semi-supervised Clustering , 2006, ECML.

[2]  Roman Filipovych,et al.  Semi-supervised cluster analysis of imaging data , 2011, NeuroImage.

[3]  Limei Zhang,et al.  Graph-optimized locality preserving projections , 2010, Pattern Recognit..

[4]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[5]  Chitta Baral,et al.  Fuzzy C-means Clustering with Prior Biological Knowledge , 2022 .

[6]  Feiping Nie,et al.  Learning a Mahalanobis distance metric for data clustering and classification , 2008, Pattern Recognit..

[7]  Nozha Boujemaa,et al.  Active semi-supervised fuzzy clustering , 2008, Pattern Recognit..

[8]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[9]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, ICML '05.

[10]  Rui-Ping Li,et al.  A maximum-entropy approach to fuzzy clustering , 1995, Proceedings of 1995 IEEE International Conference on Fuzzy Systems..

[11]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[12]  Endo Yasunori,et al.  On semi-supervised fuzzy c-means clustering , 2009, 2009 IEEE International Conference on Fuzzy Systems.

[13]  Daoqiang Zhang,et al.  Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation , 2007, Pattern Recognit..

[14]  Daoqiang Zhang,et al.  Semi-supervised clustering with metric learning: An adaptive kernel method , 2010, Pattern Recognit..

[15]  Frank Seifert,et al.  Representation of cold allodynia in the human brain—A functional MRI study , 2007, NeuroImage.

[16]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[17]  Lawrence O. Hall,et al.  Fast Accurate Fuzzy Clustering through Data Reduction , 2003 .

[18]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Fei Wang,et al.  Clustering with Local and Global Regularization , 2007, IEEE Transactions on Knowledge and Data Engineering.

[20]  Jing Lu,et al.  Semi-supervised fuzzy clustering: A kernel-based approach , 2009, Knowl. Based Syst..

[21]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[22]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[23]  Sadaaki Miyamoto,et al.  Fuzzy c-means as a regularization and maximum entropy approach , 1997 .

[24]  Wei Liu,et al.  Semi-supervised distance metric learning for collaborative image retrieval and clustering , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[25]  Peng Liu,et al.  Semi-supervised sparse metric learning using alternating linearization optimization , 2010, KDD.

[26]  Hui Xiong,et al.  Enhancing semi-supervised clustering: a feature projection perspective , 2007, KDD '07.

[27]  Shunzhi Zhu,et al.  Data clustering with size constraints , 2010, Knowl. Based Syst..

[28]  Dae-Won Kim,et al.  SICAGO: Semi-supervised cluster analysis using semantic distance between gene pairs in Gene Ontology , 2010, Bioinform..

[29]  LiuWei,et al.  Semi-supervised distance metric learning for collaborative image retrieval and clustering , 2010 .