An Automatic Clustering Algorithm Based on Region Segmentation

Clustering as an unsupervised learning technique has been widely used in practice. In this paper, a novel clustering algorithm based on region segmentation (CRS) is proposed. It aims to automatically evolve the optimal number of clusters as well as the clusters of the data sets based on the data density. First, a new data density is given based on the reverse near neighbor enhancement which can make the clusters detection more effectively. Then, the multiple sub-region centers can be determined through the data density. Moreover, a merge criterion is proposed to make the relevant regions be merged and obtain the final clustering results. The proposed algorithm does not need to know the number of clusters in advance and no threshold limit. Therefore, it can be used more widely. In the experiments, we compare the performance of our CRS algorithm with DBSCAN, IS-DBSCAN, STClu, DP, and SCDOT algorithms on synthetic, and real-world data sets. Experimental results demonstrated that the NMI, ACC, F1 and ARI obtained by CRS algorithm is always better than that obtained by the other algorithms for the same data sets.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  P. Viswanath,et al.  l-DBSCAN : A Fast Hybrid Density Based Clustering Method , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[3]  Jiawei Han,et al.  Locally Consistent Concept Factorization for Document Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[4]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Yao Zhao,et al.  A dynamic niching clustering algorithm based on individual-connectedness and its application to color image segmentation , 2016, Pattern Recognit..

[6]  Lei Wang,et al.  A collaborative divide-and-conquer K-means clustering algorithm for processing large data , 2014, Conf. Computing Frontiers.

[7]  Hong Peng,et al.  k-Medoids Substitution Clustering Method and a New Clustering Validity Index Method , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[8]  N. Sandhya,et al.  Analysis of Variant Approaches for Initial Centroid Selection in K-Means Clustering Algorithm , 2018 .

[9]  Qingsheng Zhu,et al.  An Effective Algorithm Based on Density Clustering Framework , 2017, IEEE Access.

[10]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[11]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[12]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[13]  Alfredo Ferro,et al.  Enhancing density-based clustering: Parameter reduction and outlier detection , 2013, Inf. Syst..

[14]  Yi Liu,et al.  Clustering Sentences with Density Peaks for Multi-document Summarization , 2015, NAACL.

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  Anjan Mukherjee,et al.  A density-based clustering algorithm and experiments on student dataset with noises using Rough set theory , 2016, 2016 IEEE International Conference on Engineering and Technology (ICETECH).

[17]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[18]  Chul-Heui Lee,et al.  Fuzzy time series prediction using hierarchical clustering algorithms , 2011, Expert Syst. Appl..

[19]  Xuelong Li,et al.  DSets-DBSCAN: A Parameter-Free Clustering Algorithm , 2016, IEEE Transactions on Image Processing.

[20]  Erik Melander,et al.  Introducing the UCDP Georeferenced Event Dataset , 2013 .

[21]  Ashish Sharma,et al.  An Enhanced Density Based Spatial Clustering of Applications with Noise , 2009, 2009 IEEE International Advance Computing Conference.

[22]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[23]  Chak-Kuen Wong,et al.  Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees , 1977, Acta Informatica.

[24]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[25]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[26]  Qinbao Song,et al.  Automatic Clustering via Outward Statistical Testing on Density Metrics , 2016, IEEE Transactions on Knowledge and Data Engineering.

[27]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[28]  Ji-Gui Sun,et al.  Clustering Algorithms Research , 2008 .

[29]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[31]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[32]  Ronen Basri,et al.  SpectralNet: Spectral Clustering using Deep Neural Networks , 2018, ICLR.

[33]  L. Hubert,et al.  Comparing partitions , 1985 .

[34]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[35]  Xin Lu,et al.  Spatial clustering with Density-Ordered tree , 2016 .

[36]  Frank Nielsen,et al.  On weighting clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[38]  P. Viswanath,et al.  Rough-DBSCAN: A fast hybrid density based clustering method for large data sets , 2009, Pattern Recognit. Lett..

[39]  Peter Lindstrom,et al.  Locally-scaled spectral clustering using empty region graphs , 2012, KDD.

[40]  Parag Kulkarni,et al.  Algorithm to determine ε-distance parameter in density based clustering , 2014, Expert Syst. Appl..

[41]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[42]  Naixue Xiong,et al.  Spatio-Temporal Vessel Trajectory Clustering Based on Data Mapping and Density , 2018, IEEE Access.

[43]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .