A review and proposal of (fuzzy) clustering for nonlinearly separable data

Abstract In many practical situations data may be characterized by nonlinearly separable clusters. Classical (hard or fuzzy) clustering algorithms produce a partition of objects by computing the Euclidean distance. As such, they are based on the linearity assumption and, therefore, do not identify properly clusters characterized by nonlinear structures. To overcome this limitation, several approaches can be followed: density-, kernel-, graph- or manifold-based clustering. A review of these approaches is offered and some new fuzzy manifold-based clustering algorithms, involving the so-called geodesic distance, are proposed. The effectiveness of such algorithms is shown by synthetic, benchmark and real data.

[1]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[2]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[3]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[4]  Paolo Giordani,et al.  A fuzzy clustering procedure for random fuzzy sets , 2016, Fuzzy Sets Syst..

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  Hans-Peter Kriegel,et al.  Density‐based clustering , 2011, WIREs Data Mining Knowl. Discov..

[7]  Alireza Bayestehtashk,et al.  Nonlinear subspace clustering using curvature constrained distances , 2015, Pattern Recognit. Lett..

[8]  Paolo Giordani,et al.  A toolbox for fuzzy clustering using the R programming language , 2015, Fuzzy Sets Syst..

[9]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[10]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[11]  Dao-Qiang Zhang,et al.  A novel kernelized fuzzy C-means algorithm with application in medical image segmentation , 2004, Artif. Intell. Medicine.

[12]  Michal Daszykowski,et al.  Revised DBSCAN algorithm to cluster data with dense adjacent clusters , 2013 .

[13]  Jörg Sander Density-Based Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[14]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[18]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[19]  Alessio Serafini,et al.  fclust: An R Package for Fuzzy Clustering , 2019, R J..

[20]  Xian Fu,et al.  Kernel-based fuzzy c-means clustering algorithm based on genetic algorithm , 2016, Neurocomputing.

[21]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  Pascal Frossard,et al.  Tangent-based manifold approximation with locally linear models , 2012, Signal Process..

[24]  Hans-Peter Kriegel,et al.  DBSCAN Revisited, Revisited , 2017, ACM Trans. Database Syst..

[25]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[26]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[27]  Dong-Ho Lee,et al.  Generalised kernel weighted fuzzy C-means clustering algorithm with local information , 2018, Fuzzy Sets Syst..

[28]  Adrian E. Raftery,et al.  Finding Curvilinear Features in Spatial Point Patterns: Principal Curve Clustering with Noise , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Thomas A. Runkler Relational Fuzzy Clustering , 2007 .

[30]  János Abonyi,et al.  Geodesic distance based fuzzy c-medoid clustering - searching for central points in graphs and high dimensional data , 2016, Fuzzy Sets Syst..

[31]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[32]  Dao-Qiang Zhang,et al.  Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm , 2003, Neural Processing Letters.

[33]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[34]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[36]  János Abonyi,et al.  Geodesic Distance Based Fuzzy Clustering , 2007 .

[37]  Arian Maleki,et al.  Geodesic K-means clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[38]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[39]  Gloria Bordogna,et al.  Fuzzy Core DBScan Clustering Algorithm , 2014, IPMU.

[40]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[41]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[42]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[43]  Seungjin Choi,et al.  Soft Geodesic Kernel K-Means , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[44]  R. Krishnapuram,et al.  A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[45]  Mukund Balasubramanian,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[46]  Stephen Warshall,et al.  A Theorem on Boolean Matrices , 1962, JACM.

[47]  Rajesh N. Davé,et al.  Robust fuzzy clustering of relational data , 2002, IEEE Trans. Fuzzy Syst..

[48]  Malay K. Pakhira,et al.  Clustering of scale free networks using a k-medoid framework , 2011, 2011 2nd International Conference on Computer and Communication Technology (ICCCT-2011).

[49]  Jiawei Han,et al.  Mining scale-free networks using geodesic clustering , 2004, KDD.

[50]  Cao Jing,et al.  Approaches for scaling DBSCAN algorithm to large spatial databases , 2000 .

[51]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[52]  Gloria Bordogna,et al.  Fuzzy extensions of the DBScan clustering algorithm , 2016, Soft Comput..

[53]  Efendi N. Nasibov,et al.  Robustness of density-based clustering methods with various neighborhood relations , 2009, Fuzzy Sets Syst..

[54]  Wei-keng Liao,et al.  A Fast DBSCAN Algorithm with Spark Implementation , 2018 .