Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding

A novel density-based clustering algorithm named QCC is presented recently. Although the algorithm has proved its strong robustness, it is still necessary to manually determine the two input parameters, including the number of neighbors (k) and the similarity threshold value ($$\alpha $$α), which severely limits the promotion of the algorithm. In addition, the QCC does not perform excellently when confronting the datasets with relatively high dimensions. To overcome these defects, firstly, we define a new method for computing local density and introduce the strategy of potential entropy into the original algorithm. Based on this idea, we propose a new QCC clustering algorithm (QCC-PE). QCC-PE can automatically extract optimal value of the parameter k by optimizing potential entropy of data field. By this means, the optimized parameter can be calculated from the datasets objectively rather than the empirical estimation accumulated from a large number of experiments. Then, t-distributed stochastic neighbor embedding (tSNE) is applied to the model of QCC-PE and further brings forward a method based on tSNE (QCC-PE-tSNE), which preprocesses high-dimensional datasets by dimensionality reduction technique. We compare the performance of the proposed algorithms with QCC, DBSCAN, and DP in the synthetic datasets, Olivetti Face Database, and real-world datasets respectively. Experimental results show that our algorithms are feasible and effective and can often outperform the comparisons.

[1]  Qingsheng Zhu,et al.  QCC: a novel clustering algorithm based on Quasi-Cluster Centers , 2017, Machine Learning.

[2]  Yanling Li,et al.  An automatic fuzzy c-means algorithm for image segmentation , 2009, Soft Comput..

[3]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[4]  Alfredo Ferro,et al.  Enhancing density-based clustering: Parameter reduction and outlier detection , 2013, Inf. Syst..

[5]  Shifei Ding,et al.  A semi-supervised approximate spectral clustering algorithm based on HMRF model , 2018, Inf. Sci..

[6]  Dunja Mladenic,et al.  The Role of Hubness in Clustering High-Dimensional Data , 2014, IEEE Trans. Knowl. Data Eng..

[7]  Barbara Hammer,et al.  Parametric nonlinear dimensionality reduction using kernel t-SNE , 2015, Neurocomputing.

[8]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[9]  Assaf Gottlieb,et al.  Algorithm for data clustering in pattern recognition problems based on quantum mechanics. , 2001, Physical review letters.

[10]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[11]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[12]  Xiao Xu,et al.  An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood , 2017, Knowl. Based Syst..

[13]  Dunja Mladenic,et al.  The Role of Hubness in Clustering High-Dimensional Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[14]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  Yu-Chieh Wu A top-down information theoretic word clustering algorithm for phrase recognition , 2014, Inf. Sci..

[17]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[19]  A. Rama Mohan Reddy,et al.  A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method , 2016, Pattern Recognit..

[20]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[21]  T. Soni Madhulatha,et al.  An Overview on Clustering Methods , 2012, ArXiv.

[22]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[23]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[24]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[25]  Rongfang Bie,et al.  Clustering by fast search and find of density peaks via heat diffusion , 2016, Neurocomputing.

[26]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[27]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[28]  Bartlomiej Zielinski,et al.  Tree-based binary image dissimilarity measure with meta-heuristic optimization , 2015, Pattern Analysis and Applications.

[29]  Nicolas Brunel,et al.  Stimulus Dependence of Local Field Potential Spectra: Experiment versus Theory , 2014, The Journal of Neuroscience.

[30]  Andries Petrus Engelbrecht,et al.  An overview of clustering methods , 2007, Intell. Data Anal..

[31]  Shuliang Wang,et al.  Clustering by Fast Search and Find of Density Peaks with Data Field , 2016 .

[32]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[33]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[34]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[35]  K. F. Chen,et al.  Observation of B+-ppK+ , 2002 .

[36]  Arun K. Pujari,et al.  QROCK: A quick version of the ROCK algorithm for clustering of categorical data , 2005, Pattern Recognit. Lett..

[37]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[38]  Kristin J. Dana,et al.  Modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) for visual clustering , 2016, Pattern Analysis and Applications.

[39]  Wenke Zang,et al.  Automatic Density Peaks Clustering Using DNA Genetic Algorithm Optimized Data Field and Gaussian Process , 2017, Int. J. Pattern Recognit. Artif. Intell..

[40]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[41]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[42]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[43]  Stephen Grossberg,et al.  ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures , 1990, Neural Networks.

[44]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[45]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[46]  Hong Yan,et al.  An adaptive spatial fuzzy clustering algorithm for 3-D MR image segmentation , 2003, IEEE Transactions on Medical Imaging.

[47]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.