Soft clustering - Fuzzy and rough approaches and their extensions and derivatives

Clustering is one of the most widely used approaches in data mining with real life applications in virtually any domain. The huge interest in clustering has led to a possibly three-digit number of algorithms with the k-means family probably the most widely used group of methods. Besides classic bivalent approaches, clustering algorithms belonging to the domain of soft computing have been proposed and successfully applied in the past four decades. Bezdek's fuzzy c-means is a prominent example for such soft computing cluster algorithms with many effective real life applications. More recently, Lingras and West enriched this area by introducing rough k-means. In this article we compare k-means to fuzzy c-means and rough k-means as important representatives of soft clustering. On the basis of this comparison, we then survey important extensions and derivatives of these algorithms; our particular interest here is on hybrid clustering, merging fuzzy and rough concepts. We also give some examples where k-means, rough k-means, and fuzzy c-means have been used in studies.

[1]  Hong Yan,et al.  Fuzzy clustering analysis of microarray data , 2008, Proceedings of the Institution of Mechanical Engineers. Part H, Journal of engineering in medicine.

[2]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[3]  Richard Weber,et al.  Dynamic rough clustering and its applications , 2012, Appl. Soft Comput..

[4]  Pawan Lingras,et al.  Survey of Rough and Fuzzy Hybridization , 2007, 2007 IEEE International Fuzzy Systems Conference.

[5]  Mingtian Zhou,et al.  A Refined Rough k-Means Clustering with Hybrid Threshold , 2012, RSCTC.

[6]  Witold Pedrycz,et al.  Knowledge-based clustering - from data to information granules , 2007 .

[7]  Francesco Masulli,et al.  Soft transition from probabilistic to possibilistic fuzzy clustering , 2006, IEEE Transactions on Fuzzy Systems.

[8]  Amit Banerjee,et al.  Robust clustering , 2012, WIREs Data Mining Knowl. Discov..

[9]  Witold Pedrycz,et al.  Interpretation of clusters in the framework of shadowed sets , 2005, Pattern Recognit. Lett..

[11]  Thierry Denoeux,et al.  CECM: Constrained evidential C-means algorithm , 2012, Comput. Stat. Data Anal..

[12]  Richard Weber,et al.  Evolutionary Rough k-Medoid Clustering , 2008, Trans. Rough Sets.

[13]  Sankar K. Pal,et al.  RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets , 2007, Fundam. Informaticae.

[14]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[15]  R.N. Dave,et al.  Generalized noise clustering as a robust fuzzy c-M-estimators model , 1998, 1998 Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.98TH8353).

[16]  Kaoru Hirota,et al.  Concepts of probabilistic sets , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[17]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[18]  Pawan Lingras,et al.  Applications of Rough Set Based K-Means, Kohonen SOM, GA Clustering , 2007, Trans. Rough Sets.

[19]  Yi Pan,et al.  Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property , 2005, IEEE Transactions on NanoBioscience.

[20]  D. Dubois,et al.  ROUGH FUZZY SETS AND FUZZY ROUGH SETS , 1990 .

[21]  Pawan Lingras,et al.  Temporal analysis of clusters of supermarket customers: conventional versus interval set approach , 2005, Inf. Sci..

[22]  Karim R. Lakhani,et al.  Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects , 2003 .

[23]  Girish N. Punj,et al.  Cluster Analysis in Marketing Research: Review and Suggestions for Application , 1983 .

[24]  O. O. Oladipupo,et al.  Application of k Means Clustering algorithm for prediction of Students Academic Performance , 2010, ArXiv.

[25]  Witold Pedrycz,et al.  From fuzzy sets to shadowed sets: Interpretation and computing , 2009, Int. J. Intell. Syst..

[26]  Min Chen,et al.  Rough Cluster Quality Index Based on Decision Theory , 2009, IEEE Transactions on Knowledge and Data Engineering.

[27]  Thierry Denoeux,et al.  ECM: An evidential version of the fuzzy c , 2008, Pattern Recognit..

[28]  Guru Nanak,et al.  Neighborhood Clustering of Web Users With Rough K-Means , 2007 .

[29]  D. Baker,et al.  Recurring local sequence motifs in proteins. , 1995, Journal of molecular biology.

[30]  Witold Pedrycz,et al.  Collaborative fuzzy clustering , 2002, Pattern Recognit. Lett..

[31]  Witold Pedrycz,et al.  Shadowed c-means: Integrating fuzzy and rough clustering , 2010, Pattern Recognit..

[32]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[33]  Sun Bing,et al.  Application of factor analysis and fuzzy c-means for classification of knowledge intensity in China's manufacturing industry , 2011, 2011 International Conference on Management Science & Engineering 18th Annual Conference Proceedings.

[34]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[35]  F. Chung-Hoon Rhee Uncertain Fuzzy Clustering: Insights and Recommendations , 2007 .

[36]  M. P. Windham Cluster validity for fuzzy clustering algorithms , 1981 .

[37]  Pradipta Maji,et al.  Microarray Time-Series Data Clustering Using Rough-Fuzzy C-Means Algorithm , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[38]  Veit Schwammle,et al.  A simple and fast method to determine the parameters for fuzzy c-means cluster validation , 2010, 1004.1307.

[39]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[40]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[41]  Rui Yan,et al.  Comparison of Conventional and Rough K-Means Clustering , 2003, RSFDGrC.

[42]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[43]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[44]  Witold Pedrycz,et al.  Conditional Fuzzy C-Means , 1996, Pattern Recognit. Lett..

[45]  Jian Yu,et al.  Analysis of the weighting exponent in the FCM , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[46]  Georg Peters,et al.  Some refinements of rough k-means clustering , 2006, Pattern Recognit..

[47]  Kuo-Lung Wu,et al.  Analysis of parameter selections for fuzzy c-means , 2012, Pattern Recognit..

[48]  Witold Pedrycz,et al.  Shadowed sets in the characterization of rough-fuzzy clustering , 2011, Pattern Recognit..

[49]  Sushmita Mitra,et al.  Rough-Fuzzy Clustering: An Application to Medical Imagery , 2008, RSKT.

[50]  P Dulyakarn,et al.  FUZZY C-MEANS CLUSTERING USING SPATIAL INFORMATION WITH APPLICATION TO REMOTE SENSING , 2001 .

[51]  Palma Blonda,et al.  A survey of fuzzy clustering algorithms for pattern recognition. I , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[52]  Sankar K. Pal,et al.  Rough Set Based Generalized Fuzzy $C$ -Means Algorithm and Quantitative Indices , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[53]  Pawan Lingras,et al.  Interval Set Clustering of Web Users with Rough K-Means , 2004, Journal of Intelligent Information Systems.

[54]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[55]  Thierry Denoeux,et al.  Clustering interval-valued proximity data using belief functions , 2004, Pattern Recognit. Lett..

[56]  Li Wei,et al.  Network Traffic Classification Using K-means Clustering , 2007 .

[57]  C. Raghavendra Rao,et al.  Correlating Fuzzy and Rough Clustering , 2012, Fundam. Informaticae.

[58]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[59]  Arthur P. Dempster,et al.  A Generalization of Bayesian Inference , 1968, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[60]  Alfredo Petrosino,et al.  Rough fuzzy set based scale space transforms and their use in image analysis , 2006, Int. J. Approx. Reason..

[61]  Sankar K. Pal,et al.  Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation , 2008, Trans. Rough Sets.

[62]  T. Denœux,et al.  Clustering of proximity data using belief functions , 2003 .

[63]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[64]  Witold Pedrycz,et al.  Rough–Fuzzy Collaborative Clustering , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[65]  Pawan Lingras,et al.  Rough clustering , 2011, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[66]  Veit Schwämmle,et al.  BIOINFORMATICS ORIGINAL PAPER , 2022 .

[67]  Fernando A. Crespo,et al.  Rough Clustering Approaches for Dynamic Environments , 2012 .

[68]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[69]  Arthur P. Dempster,et al.  Classic Works on the Dempster-Shafer Theory of Belief Functions (Studies in Fuzziness and Soft Computing) , 2007 .

[70]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[71]  Frederick E. Croxton,et al.  Applied General Statistics. , 1940 .

[72]  Peng Gao,et al.  Application of fuzzy c-means clustering in data analysis of metabolomics. , 2009, Analytical chemistry.

[73]  Ting-Cheng Chang,et al.  Determination of the threshold value β of variable precision rough set by fuzzy algorithms , 2011, Int. J. Approx. Reason..

[74]  Alfredo Petrosino,et al.  Unsupervised texture discrimination based on rough fuzzy sets and parallel hierarchical clustering , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[75]  Martin Lampart,et al.  A Partitive Rough Clustering Algorithm , 2006, RSCTC.

[76]  E. Y. K. Ng,et al.  Application of K- and Fuzzy c-Means for Color Segmentation of Thermal Infrared Breast Images , 2010, Journal of Medical Systems.

[77]  Sushmita Mitra,et al.  Computational Intelligence in Bioinformatics , 2005, Trans. Rough Sets.

[78]  Pierre Loonis,et al.  The fuzzy c+2-means: solving the ambiguity rejection in clustering , 2000, Pattern Recognit..

[79]  Sushmita Mitra An evolutionary rough partitive clustering , 2004, Pattern Recognit. Lett..

[80]  Palma Blonda,et al.  A survey of fuzzy clustering algorithms for pattern recognition. II , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[81]  Sueli Aparecida Mingoti,et al.  Comparing SOM neural network with Fuzzy c , 2006, Eur. J. Oper. Res..

[82]  Y. Qian K-means Algorithm And Its ApplicationFor Clustering Companies Listed InZhejiang Province , 2006 .

[83]  James C. Bezdek,et al.  A mixed c-means clustering model , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[84]  Nigel K. L. Pope,et al.  Buying or browsing? An exploration of shopping orientations and online purchase intention , 2003 .

[85]  Pradipta Maji,et al.  Rough-Fuzzy C-Means for Clustering Microarray Gene Expression Data , 2012, PerMIn.

[86]  J. Bezdek,et al.  Fuzzy partitions and relations; an axiomatic basis for clustering , 1978 .

[87]  Hong Liu,et al.  Application Research of k-means Clustering Algorithm in Image Retrieval System , 2009 .

[88]  Ronald R. Yager,et al.  Classic Works of the Dempster-Shafer Theory of Belief Functions , 2010, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[89]  Parvesh Kumar,et al.  Comparative Study of K-Means , Pam and Rough K-Means Algorithms Using Cancer Datasets , 2011 .

[90]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[91]  Thierry Denoeux,et al.  EVCLUS: evidential clustering of proximity data , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[92]  James C. Bezdek,et al.  Fuzzy mathematics in pattern classification , 1973 .

[93]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[94]  Hannu Koivisto,et al.  Profiling Network Applications with Fuzzy C-Means Clustering and Self-Organizing Map , 2002, FSKD.

[95]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[96]  Richard Weber,et al.  A methodology for dynamic data mining based on fuzzy clustering , 2005, Fuzzy Sets Syst..

[97]  S. Begum,et al.  Fuzzy Algorithms for Pattern Recognition in Medical Diagnosis , 2011 .

[98]  Monika Hanesch,et al.  The application of fuzzy C-means cluster analysis and non-linear mapping to a soil data set for the detection of polluted sites , 2001 .

[99]  Mohan Trivedi,et al.  Segmentation of a Thematic Mapper Image Using the Fuzzy c-Means Clusterng Algorthm , 1986, IEEE Transactions on Geoscience and Remote Sensing.

[100]  Alex B. McBratney,et al.  Soil pattern recognition with fuzzy-c-means : application to classification and soil-landform interrelationships , 1992 .

[101]  Pawan Lingras,et al.  Evidential Clustering or Rough Clustering: The Choice Is Yours , 2012, RSKT.

[102]  Jaroslaw Stepaniuk,et al.  Rough Entropy Based k-Means Clustering , 2009, RSFDGrC.

[103]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[104]  Quan Pan,et al.  Belief C-Means: An extension of Fuzzy C-Means algorithm in belief functions framework , 2012, Pattern Recognit. Lett..

[105]  Pawan Lingras,et al.  Unsupervised Rough Set Classification Using GAs , 2001, Journal of Intelligent Information Systems.

[106]  Georg Peters,et al.  Outliers in Rough k-Means Clustering , 2005, PReMI.

[107]  Yi Lu,et al.  Incremental genetic K-means algorithm and its application in gene expression data analysis , 2004, BMC Bioinformatics.

[108]  Thierry Denoeux,et al.  RECM: Relational evidential c-means algorithm , 2009, Pattern Recognit. Lett..

[109]  Witold Pedrycz,et al.  Collaborative Fuzzy Clustering Algorithms: Some Refinements and Design Guidelines , 2012, IEEE Transactions on Fuzzy Systems.

[110]  Witold Pedrycz,et al.  Shadowed sets: representing and processing fuzzy sets , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[111]  R. J. Kuo,et al.  Integration of self-organizing feature map and K-means algorithm for market segmentation , 2002, Comput. Oper. Res..