QCC: a novel clustering algorithm based on Quasi-Cluster Centers

Cluster analysis aims at classifying objects into categories on the basis of their similarity and has been widely used in many areas such as pattern recognition and image processing. In this paper, we propose a novel clustering algorithm called QCC mainly based on the following ideas: the density of a cluster center is the highest in its K nearest neighborhood or reverse K nearest neighborhood, and clusters are divided by sparse regions. Besides, we define a novel concept of similarity between clusters to solve the complex-manifold problem. In experiments, we compare the proposed algorithm QCC with DBSCAN, DP and DAAP algorithms on synthetic and real-world datasets. Results show that QCC performs the best, and its superiority on clustering non-spherical data and complex-manifold data is especially large.

[1]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[4]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[5]  Qinbao Song,et al.  Automatic Clustering via Outward Statistical Testing on Density Metrics , 2016, IEEE Transactions on Knowledge and Data Engineering.

[6]  Carlos Ordonez,et al.  FREM: fast and robust EM clustering for large data sets , 2002, CIKM '02.

[7]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[8]  Andrew W. Moore,et al.  Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-Trees , 1998, NIPS.

[9]  Benjamin King Step-Wise Clustering Procedures , 1967 .

[10]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[11]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[12]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[13]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[14]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[15]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[16]  Xiangliang Zhang,et al.  K-AP: Generating Specified K Clusters by Efficient Affinity Propagation , 2010, 2010 IEEE International Conference on Data Mining.

[17]  Jong-Seok Lee,et al.  Robust outlier detection using the instability factor , 2014, Knowl. Based Syst..

[18]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[19]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[20]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[21]  A. Hoffman,et al.  Lower bounds for the partitioning of graphs , 1973 .

[22]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[23]  Hichem Frigui,et al.  Self-Organization of Pulse-Coupled Oscillators with Application to Clustering , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[25]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[26]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[27]  Hongjie Jia,et al.  A density-adaptive affinity propagation clustering algorithm based on spectral dimension reduction , 2014, Neural Computing and Applications.

[28]  Cai Yi-chao Survey of Clustering Algorithms in Data Mining , 2007 .

[29]  Sarika Chaudhary,et al.  A Survey: Clustering Algorithms in Data Mining , 2015 .

[30]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[31]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[32]  Alfredo Ferro,et al.  Enhancing density-based clustering: Parameter reduction and outlier detection , 2013, Inf. Syst..

[33]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[34]  Tao Li,et al.  Document clustering via adaptive subspace iteration , 2004, SIGIR '04.

[35]  Jiong Yang,et al.  STING+: an approach to active spatial data mining , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[36]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[37]  Zhang Ying,et al.  A clustering algorithm based on natural nearest neighbor , 2014 .

[38]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[39]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[40]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..