DenMune: Density peak based clustering using mutual nearest neighbors

Abstract Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other, even in two dimensions. A novel clustering algorithm “DenMune” is presented to meet this challenge. It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user, besides obeying the mutual nearest neighbor consistency principle. The algorithm is stable for a wide range of values of K. Moreover, it is able to automatically detect and remove noise from the clustering process as well as detecting the target clusters. It produces robust results on various low and high dimensional datasets relative to several known state of the art clustering algorithms.

[1]  Yiu-ming Cheung,et al.  Fast and Accurate Hierarchical Clustering Based on Growing Multilayer Topology Training , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Vladlen Koltun,et al.  Robust continuous clustering , 2017, Proceedings of the National Academy of Sciences.

[3]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[4]  Margareta Ackerman,et al.  To Cluster, or Not to Cluster: An Analysis of Clusterability Methods , 2018, Pattern Recognit..

[5]  Sami Sieranoja,et al.  How much can k-means be improved by using better initialization and repeats? , 2019, Pattern Recognit..

[6]  Karimnagar Salim Jiwani,et al.  A Survey on clustering , 2010 .

[7]  Shahin Pourbahrami,et al.  A Novel and Efficient Data Point Neighborhood Construction Algorithm based on Apollonius Circle , 2019, Expert Syst. Appl..

[8]  M. Saquib Sarfraz,et al.  Efficient Parameter-Free Clustering Using First Neighbor Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[10]  Kai Zhang,et al.  Deep learning for image-based cancer detection and diagnosis - A survey , 2018, Pattern Recognit..

[11]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[12]  Philippe Fournier-Viger,et al.  Fast and effective cluster-based information retrieval using frequent closed itemsets , 2018, Inf. Sci..

[13]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[14]  Changzheng He,et al.  Objective Cluster Analysis in Value-Based Customer Segmentation Method , 2009, 2009 Second International Workshop on Knowledge Discovery and Data Mining.

[15]  M. Abbas,et al.  Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model , 2012 .

[16]  Yi Wan,et al.  PHA: A fast potential-based hierarchical agglomerative clustering method , 2013, Pattern Recognit..

[17]  Optimised quantisation method for approximate nearest neighbour search , 2017 .

[18]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[19]  Chang-Dong Wang,et al.  A Novel clustering method based on hybrid K-nearest-neighbor graph , 2018, Pattern Recognit..

[20]  Hossam Faris,et al.  An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio , 2019, International Journal of Machine Learning and Cybernetics.

[21]  Sami Sieranoja,et al.  Fast and general density peaks clustering , 2019, Pattern Recognit. Lett..

[22]  Pasi Fränti Efficiency of random swap clustering , 2018, Journal of Big Data.

[23]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[24]  Raymond Greenlaw,et al.  Survey of Clustering: Algorithms and Applications , 2013, Int. J. Inf. Retr. Res..

[25]  Mohammed Otair,et al.  Approximate k-nearest neighbour based spatial clustering using k-d tree , 2013, ArXiv.

[26]  Jong-Seok Lee,et al.  Data clustering by minimizing disconnectivity , 2011, Inf. Sci..

[27]  Frédéric Oblé,et al.  Combining unsupervised and supervised learning in credit card fraud detection , 2019, Inf. Sci..

[28]  Amin A. Shoukry,et al.  CMUNE: A clustering using mutual nearest neighbors algorithm , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).