Fuzzy C-Means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: Review and development

Abstract Clustering algorithms aim at finding dense regions of data based on similarities and dissimilarities of data points. Noise and outliers contribute to the computational procedure of the algorithms as well as the actual data points that leads to inaccurate and misplaced cluster centers. This problem also arises when sizes of the clusters are different that moves centers of small clusters towards large clusters. Mass of the data points is important as well as their location in engineering and physics where non-uniform mass distribution results displacement of the cluster centers towards heavier clusters even if sizes of the clusters are identical and the data are noise-free. Fuzzy C-Means (FCM) algorithm that suffers from these problems is the most popular fuzzy clustering algorithm and has been subject of numerous researches and developments though improvements are still marginal. This work revises the FCM algorithm to make it applicable to data with unequal cluster sizes, noise and outliers, and non-uniform mass distribution. Revised FCM (RFCM) algorithm employs adaptive exponential functions to eliminate impacts of noise and outliers on the cluster centers and modifies constraint of the FCM algorithm to prevent large or heavier clusters from attracting centers of small clusters. Several algorithms are reviewed and their mathematical structures are discussed in the paper including Possibilistic Fuzzy C-Means (PFCM), Possibilistic C-Means (PCM), Robust Fuzzy C-Means (FCM-σ), Noise Clustering (NC), Kernel Fuzzy C-Means (KFCM), Intuitionistic Fuzzy C-Means (IFCM), Robust Kernel Fuzzy C-Mean (KFCM-σ), Robust Intuitionistic Fuzzy C-Means (IFCM-σ), Kernel Intuitionistic Fuzzy C-Means (KIFCM), Robust Kernel Intuitionistic Fuzzy C-Means (KIFCM-σ), Credibilistic Fuzzy C-Means (CFCM), Size-insensitive integrity-based Fuzzy C-Means (siibFCM), Size-insensitive Fuzzy C-Means (csiFCM), Subtractive Clustering (SC), Density Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models (GMM), Spectral clustering, and Outlier Removal Clustering (ORC). Some of these algorithms are suitable for noisy data and some others are designed for data with unequal clusters. The study shows that the RFCM algorithm works for both cases and outperforms the both categories of the algorithms.

[1]  Tim Wilkin,et al.  Characterizing Compactness of Geometrical Clusters Using Fuzzy Measures , 2015, IEEE Transactions on Fuzzy Systems.

[2]  Feiping Nie,et al.  Multi-view spectral clustering via sparse graph learning , 2020, Neurocomputing.

[3]  Jacek M. Leski,et al.  Fuzzy c-ordered-means clustering , 2016, Fuzzy Sets Syst..

[4]  Yannis A. Tolias,et al.  Image segmentation by a fuzzy clustering algorithm using adaptive spatially constrained membership functions , 1998, IEEE Trans. Syst. Man Cybern. Part A.

[5]  Khaled Mellouli,et al.  Clustering Approach Using Belief Function Theory , 2006, AIMSA.

[6]  Yiu-ming Cheung,et al.  Learning a mixture model for clustering with the completed likelihood minimum message length criterion , 2014, Pattern Recognit..

[7]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[8]  Sankar K. Pal,et al.  Rough Set Based Generalized Fuzzy $C$ -Means Algorithm and Quantitative Indices , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  R. Janani,et al.  Text document clustering using Spectral Clustering algorithm with Particle Swarm Optimization , 2019, Expert Syst. Appl..

[10]  James M. Keller,et al.  The possibilistic C-means algorithm: insights and recommendations , 1996, IEEE Trans. Fuzzy Syst..

[11]  Shyi-Ming Chen,et al.  Fuzzy Forecasting Based on Fuzzy-Trend Logical Relationship Groups , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[13]  Krzysztof Siminski Fuzzy weighted C-ordered means clustering algorithm , 2017, Fuzzy Sets Syst..

[14]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[15]  James M. Keller,et al.  Comparing Fuzzy, Probabilistic, and Possibilistic Partitions , 2010, IEEE Transactions on Fuzzy Systems.

[16]  Shyi-Ming Chen,et al.  TAIEX Forecasting Using Fuzzy Time Series and Automatically Generated Weights of Multiple Factors , 2012, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[17]  Anjana Gosain,et al.  Novel Intuitionistic Fuzzy C-Means Clustering for Li nearly and Nonlinearly Separable Data , 2012 .

[18]  Moshe Kam,et al.  The credibilistic fuzzy c means clustering algorithm , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[19]  N Montazerin,et al.  Modeling energy flow in natural gas networks using time series disaggregation and fuzzy systems tuned by particle swarm optimization , 2020, Appl. Soft Comput..

[20]  Lawrence O. Hall,et al.  A generic knowledge-guided image segmentation and labeling system using fuzzy clustering algorithms , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[21]  Mohammad Hossein Fazel Zarandi,et al.  Generalized entropy based possibilistic fuzzy C-Means for clustering noisy data and its convergence proof , 2017, Neurocomputing.

[22]  Zhiyang Li,et al.  Revisiting spectral clustering for near-convex decomposition of 2D shape , 2020, Pattern Recognit..

[23]  Jeng-Shyang Pan,et al.  Fuzzy Forecasting Based on Two-Factors Second-Order Fuzzy-Trend Logical Relationship Groups and Particle Swarm Optimization Techniques , 2013, IEEE Transactions on Cybernetics.

[24]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[25]  Jiang-She Zhang,et al.  Improved possibilistic C-means clustering algorithms , 2004, IEEE Trans. Fuzzy Syst..

[26]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[27]  Lale Akarun,et al.  Fuzzy algorithms for combined quantization and dithering , 2001, IEEE Trans. Image Process..

[28]  Soon-H. Kwon Cluster validity index for fuzzy clustering , 1998 .

[29]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[30]  C. L. Philip Chen,et al.  A Multiple-Feature and Multiple-Kernel Scene Segmentation Algorithm for Humanoid Robot , 2014, IEEE Transactions on Cybernetics.

[31]  Flávio Miguel Varejão,et al.  Sampling approaches for applying DBSCAN to large datasets , 2019, Pattern Recognit. Lett..

[32]  S. Askari,et al.  A clustering based forecasting algorithm for multivariable fuzzy time series using linear combinations of independent variables , 2015, Appl. Soft Comput..

[33]  Marimuthu Palaniswami,et al.  Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[34]  Feng Xia,et al.  A High-Order Possibilistic $C$-Means Algorithm for Clustering Incomplete Multimedia Data , 2017, IEEE Systems Journal.

[35]  Anjana Gosain,et al.  Performance Analysis of Various Fuzzy Clustering Algorithms: A Review , 2016 .

[36]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[37]  Chih-Min Lin,et al.  An Efficient Interval Type-2 Fuzzy CMAC for Chaos Time-Series Prediction and Synchronization , 2014, IEEE Transactions on Cybernetics.

[38]  Stephen L. Chiu,et al.  Fuzzy Model Identification Based on Cluster Estimation , 1994, J. Intell. Fuzzy Syst..

[39]  Tamalika Chaira,et al.  A novel intuitionistic fuzzy C means clustering algorithm and its application to medical images , 2011, Appl. Soft Comput..

[40]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  R. Yager,et al.  Approximate Clustering Via the Mountain Method , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[42]  Hiok Chai Quek,et al.  Falcon: neural fuzzy control and decision systems using FKP and PFKP clustering algorithms , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[43]  Po-Whei Huang,et al.  A size-insensitive integrity-based fuzzy c-means method for data clustering , 2014, Pattern Recognit..

[44]  Yang Liu,et al.  Grid-based DBSCAN: Indexing and inference , 2019, Pattern Recognit..

[45]  Tomi Kinnunen,et al.  Improving K-Means by Outlier Removal , 2005, SCIA.

[46]  S. Askari Oil Reservoirs Classification Using Fuzzy Clustering , 2017 .

[47]  S. Askari,et al.  A novel and fast MIMO fuzzy inference system based on a class of fuzzy clustering algorithms with interpretability and complexity analysis , 2017, Expert Syst. Appl..

[48]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[49]  S. Askari Oil Reservoirs Classification Using Fuzzy Clustering (RESEARCH NOTE) , 2017 .

[50]  Athanasios A. Rontogiannis,et al.  On the Convergence of the Sparse Possibilistic C-Means Algorithm , 2015, IEEE Transactions on Fuzzy Systems.

[51]  JingTao Yao,et al.  A three-way clustering method based on an improved DBSCAN algorithm , 2019 .

[52]  Shyi-Ming Chen,et al.  Fuzzy Forecasting Based on Two-Factors Second-Order Fuzzy-Trend Logical Relationship Groups and the Probabilities of Trends of Fuzzy Logical Relationships , 2015, IEEE Transactions on Cybernetics.

[53]  Bilwaj Gaonkar,et al.  Control-group feature normalization for multivariate pattern analysis of structural MRI data using the support vector machine , 2016, NeuroImage.

[54]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[55]  C. L. Philip Chen,et al.  A Multiple-Kernel Fuzzy C-Means Algorithm for Image Segmentation , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[56]  Radu Horaud,et al.  EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  M. H. Fazel Zarandi,et al.  Forecasting semi-dynamic response of natural gas networks to nodal gas consumptions using genetic fuzzy systems , 2015 .

[58]  Du-Ming Tsai,et al.  Fuzzy C-means based clustering for linearly and nonlinearly separable data , 2011, Pattern Recognit..

[59]  Zhu Lihua,et al.  DBSCAN Clustering Algorithm for the Detection of Nearby Open Clusters Based on Gaia-DR2two , 2019, Chinese Astronomy and Astrophysics.

[60]  Sheng-Tun Li,et al.  A Stochastic HMM-Based Forecasting Model for Fuzzy Time Series , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[61]  Nikolaos G. Bourbakis,et al.  Segmentation of color images using multiscale clustering and graph theoretic region synthesis , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[62]  Mohammad Hossein Fazel Zarandi,et al.  Generalized Possibilistic Fuzzy C-Means with novel cluster validity indices for clustering noisy data , 2017, Appl. Soft Comput..

[63]  Lei Tian,et al.  Hybrid DE-EM Algorithm for Gaussian Mixture Model-Based Wireless Channel Multipath Clustering , 2019 .

[64]  Jens Jäkel,et al.  A New Convergence Proof of Fuzzy c-Means , 2005, IEEE Trans. Fuzzy Syst..

[65]  Javad Hamidzadeh,et al.  Clustering data stream with uncertainty using belief function theory and fading function , 2020, Soft Comput..

[66]  Qingsheng Zhu,et al.  Automatic PAM Clustering Algorithm for Outlier Detection , 2012, J. Softw..

[67]  Chun-Hsing Ho,et al.  An improved clustering algorithm based on finite Gaussian mixture model , 2018, Multimedia Tools and Applications.

[68]  J. C. Noordam,et al.  Multivariate image segmentation with cluster size insensitive fuzzy C-means , 2002 .

[69]  Wai Keung Wong,et al.  Adaptive Time-Variant Models for Fuzzy-Time-Series Forecasting , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[70]  Francesco Masulli,et al.  Applying the Possibilistic c-Means Algorithm in Kernel-Induced Spaces , 2010, IEEE Transactions on Fuzzy Systems.

[71]  S. Askari,et al.  A high-order multi-variable Fuzzy Time Series forecasting algorithm based on fuzzy clustering , 2015, Expert Syst. Appl..

[72]  Korris Fu-Lai Chung,et al.  Generalized Fuzzy C-Means Clustering Algorithm With Improved Fuzzy Partitions , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).