Conditional semi-fuzzy c-means clustering for imbalanced dataset

Fuzzy c-means algorithms have been widely utilised in several areas such as image segmentation, pattern recognition and data mining. However, the related studies showed the limitations in facing imbalanced datasets. The maximum fuzzy boundary tends to be located on the largest cluster which is not desirable. The overall fuzzy partition results in false grouping of edge objects and weakens the compactness of cluster. It is important the clusters are delineated by the maximum fuzzy boundary. In this study, a semi-fuzzy c-means algorithm is proposed by combining hard partition and soft partition. This study aims to provide an effective partition for the edge objects, such that the compactness of cluster can be improved. The proposed algorithm integrates the semi-fuzzy c-means method with the size-insensitive integrity-based fuzzy c-means algorithm. In particular, the latter algorithm has the ability to deal with imbalanced data. With the experiment validation, the proposed algorithm is robust and outperforms the two component algorithms by using synthetic and widely known benchmark datasets.

[1]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[2]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[3]  Om Prakash Mahela,et al.  Recognition of power quality disturbances using S-transform based ruled decision tree and fuzzy C-means clustering classifiers , 2017, Appl. Soft Comput..

[4]  Enrique H. Ruspini,et al.  A New Approach to Clustering , 1969, Inf. Control..

[5]  R. Kruse,et al.  An extension to possibilistic fuzzy cluster analysis , 2004, Fuzzy Sets Syst..

[6]  Pranab K. Muhuri,et al.  A convergence theorem and an experimental study of intuitionistic fuzzy c-mean algorithm over machine learning dataset , 2018, Appl. Soft Comput..

[7]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Lingraj Dora,et al.  A study on fuzzy clustering for magnetic resonance brain image segmentation using soft computing approaches , 2014, Appl. Soft Comput..

[9]  Yuan Zhang,et al.  Fuzzy clustering with the entropy of attribute weights , 2016, Neurocomputing.

[10]  Guy Merlin Ngounou,et al.  Optimization of Noise in Non-integrated Instrumentation Amplifier for the Amplification of Very Low Electrophisiological Signals. Case of Electro Cardio Graphic Signals (ECG). , 2014, Journal of Medical Systems.

[11]  Mauro Barni,et al.  Comments on "A possibilistic approach to clustering" , 1996, IEEE Trans. Fuzzy Syst..

[12]  Yves Lechevallier,et al.  A multi-view relational fuzzy c-medoid vectors clustering algorithm , 2015, Neurocomputing.

[13]  Qiang Liu,et al.  A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture , 2018, IEEE Access.

[14]  Francisco Chiclana,et al.  Dynamic structural neural network , 2018, J. Intell. Fuzzy Syst..

[15]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[16]  Sanjay Ranka,et al.  Gene expression Distance-based clustering of CGH data , 2006 .

[17]  Zhenbin Du,et al.  Adaptive Kernel-Based Fuzzy C-Means Clustering with Spatial Constraints for Image Segmentation , 2018, Int. J. Pattern Recognit. Artif. Intell..

[18]  Ramón López de Mántaras,et al.  New Results in Fuzzy Clustering Based on the Concept of Indistinguishability Relation , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Salvatore Sessa,et al.  Extended Fuzzy C-Means hotspot detection method for large and very large event datasets , 2018, Inf. Sci..

[20]  Po-Whei Huang,et al.  A size-insensitive integrity-based fuzzy c-means method for data clustering , 2014, Pattern Recognit..

[21]  James C. Bezdek,et al.  Partially supervised clustering for image segmentation , 1996, Pattern Recognit..

[22]  J. C. Noordam,et al.  Multivariate image segmentation with cluster size insensitive fuzzy C-means , 2002 .

[23]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Adrián Rodríguez Ramos,et al.  An approach to fault diagnosis with online detection of novel faults using fuzzy clustering tools , 2018, Expert Syst. Appl..

[25]  Chang-Dong Wang,et al.  Weighted Multi-view Clustering with Feature Selection , 2016, Pattern Recognit..

[26]  Chia-Wen Lin,et al.  CNN-Based Joint Clustering and Representation Learning with Feature Drift Compensation for Large-Scale Image Data , 2017, IEEE Transactions on Multimedia.

[27]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[28]  Fu Liu,et al.  Improving fuzzy c -means method for unbalanced dataset , 2015 .

[29]  José de Jesús Rubio,et al.  SOFMLS: Online Self-Organizing Fuzzy Modified Least-Squares Network , 2009, IEEE Transactions on Fuzzy Systems.

[30]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[31]  J. Bezdek,et al.  Convex decompositions of fuzzy partitions , 1979 .