论文信息 - CMUNE: A clustering using mutual nearest neighbors algorithm

CMUNE: A clustering using mutual nearest neighbors algorithm

A novel clustering algorithm CMune is presented for the purpose of finding clusters of arbitrary shapes, sizes and densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/blocks. These blocks are the seeds from which clusters may grow up. Therefore, CMune is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from this fact: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as K-means, DBScan, Mitosis and Spectral clustering. The quality of its results as well as its time complexity, place it at the front of these techniques.

Amin A. Shoukry | Mohamed A. Abbas

[1] Mohamed A. Ismail,et al. A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities , 2009, Pattern Recognit..

[2] M. Abbas,et al. Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model , 2012 .

[3] Ray A. Jarvis,et al. Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[4] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[5] Edward Y. Chang,et al. Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Alexander Dekhtyar,et al. Information Retrieval , 2018, Lecture Notes in Computer Science.

[7] Julia Hirschberg,et al. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[8] Michalis Vazirgiannis,et al. Cluster validity methods: part I , 2002, SGMD.

[9] Meirav Galun,et al. Fundamental Limitations of Spectral Clustering , 2006, NIPS.

[10] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[11] Jong-Seok Lee,et al. Data clustering by minimizing disconnectivity , 2011, Inf. Sci..

[12] Vipin Kumar,et al. Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[13] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.

[14] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .