CMUNE: A clustering using mutual nearest neighbors algorithm

A novel clustering algorithm CMune is presented for the purpose of finding clusters of arbitrary shapes, sizes and densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/blocks. These blocks are the seeds from which clusters may grow up. Therefore, CMune is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from this fact: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as K-means, DBScan, Mitosis and Spectral clustering. The quality of its results as well as its time complexity, place it at the front of these techniques.

[1]  Mohamed A. Ismail,et al.  A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities , 2009, Pattern Recognit..

[2]  M. Abbas,et al.  Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model , 2012 .

[3]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[4]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[5]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[7]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[8]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[9]  Meirav Galun,et al.  Fundamental Limitations of Spectral Clustering , 2006, NIPS.

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Jong-Seok Lee,et al.  Data clustering by minimizing disconnectivity , 2011, Inf. Sci..

[12]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[13]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[14]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .