Validity of Fuzzy Clustering Using Entropy Regularization

We introduce in this paper a new formulation of the regularized fuzzy c-means (FCM) algorithm which allows us to find automatically the actual number of clusters. The approach is based on the minimization of an objective function which mixes, via a particular parameter, a classical FCM term and a new entropy regularizer. The main contribution of the method is the introduction of a new exponential form of the fuzzy memberships which ensures the consistency of their bounds and makes it possible to interpret the mixing parameter as the variance (or scale) of the clusters. This variance closely related to the number of clusters, provides us with an intuitive and an easy to set parameter. We will discuss the proposed approach from the regularization point-of-view and we will demonstrate its validity both analytically and experimentally. We will show an extension of the method to nonlinearly separable data. Finally, we will illustrate preliminary results both on simple toy examples as well as database categorization problems

[1]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[3]  Alexander J. Smola,et al.  Neural Information Processing Systems , 1997, NIPS 1997.

[4]  Qingshan Liu,et al.  Face recognition using kernel based fisher discriminant analysis , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[5]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[6]  N. Boujemaa Generalized competitive clustering for image segmentation , 2000, PeachFuzz 2000. 19th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.00TH8500).

[7]  Hidetomo Ichihashi,et al.  Gaussian Mixture PDF Approximation and Fuzzy c-Means Clustering with Entropy Regularization , 2000 .

[8]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[9]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Hava T. Siegelmann,et al.  A Support Vector Method for Clustering , 2000, NIPS.

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[13]  R. Sokal,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification. , 1975 .

[14]  David T. Jones,et al.  Bioinformatics: Genes, Proteins and Computers , 2007 .

[15]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[16]  James C. Bezdek,et al.  Fuzzy Kohonen clustering networks , 1994, Pattern Recognit..

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[19]  Miin-Shen Yang A survey of fuzzy clustering , 1993 .

[20]  Nozha Boujemaa,et al.  Unsupervised Categorization for Image Database Overview , 2002, VISUAL.

[21]  Nozha Boujemaa,et al.  Upgrading Color Distributions for Image Retrieval: Can We Do Better? , 2000, VISUAL.

[22]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  R. Fletcher Practical Methods of Optimization , 1988 .