Decentralized Clustering by Finding Loose and Distributed Density Cores

Abstract Centroid-based clustering approaches fail to recognize extremely complex patterns that are non-isotropic. We analyze the underlying causes and find some inherent flaws in these approaches, including Shape Loss, False Distances and False Peaks, which typically cause centroid-based approaches to fail when applied to complex patterns. As an alternative to current methods, we propose a hybrid decentralized approach named DCore, which is based on finding density cores instead of centroids, to overcome these flaws. The underlying idea is that we consider each cluster to have a shrunken density core that roughly retains the shape of the cluster. Each such core consists of a set of loosely connected local density peaks of higher density than their surroundings. Borders, edges and outliers are distributed around the outsides of these cores in a hierarchical structure. Experiments demonstrate that the promise of DCore lies in its power to recognize extremely complex patterns and its high performance in real applications, for example, image segmentation and face clustering, regardless of the dimensionality of the space in which the data are embedded.

[1]  Zhihua Xia,et al.  Steganalysis of least significant bit matching using multi-order differences , 2014, Secur. Commun. Networks.

[2]  P. Viswanath,et al.  Rough-DBSCAN: A fast hybrid density based clustering method for large data sets , 2009, Pattern Recognit. Lett..

[3]  Marek Gagolewski,et al.  Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm , 2016, Inf. Sci..

[4]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[5]  King-Sun Fu,et al.  A Sentence-to-Sentence Clustering Procedure for Pattern Analysis , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  B. S. Duran,et al.  Cluster Analysis: A Survey , 1974 .

[7]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[8]  Yufei Tao,et al.  DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation , 2015, SIGMOD Conference.

[9]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[11]  Jin-Yin Chen,et al.  A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data , 2016, Inf. Sci..

[12]  Alvaro Soto,et al.  A proposal for supervised clustering with Dirichlet Process using labels , 2016, Pattern Recognit. Lett..

[13]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[14]  C. Gold Problems with handling spatial data ― the Voronoi approach , 1991 .

[15]  Bin Gu,et al.  Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Naixue Xiong,et al.  Steganalysis of LSB matching using differences between nonadjacent pixels , 2016, Multimedia Tools and Applications.

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[19]  Ki-Joune Li,et al.  A spatial data mining method by Delaunay triangulation , 1997, GIS '97.

[20]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[21]  D. W. Scott,et al.  Variable Kernel Density Estimation , 1992 .

[22]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[23]  P. V. Kerm,et al.  Adaptive kernel density estimation , 2003 .

[24]  Dervis Karaboga,et al.  Dynamic clustering with improved binary artificial bee colony algorithm , 2015, Appl. Soft Comput..

[25]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[26]  Leandro Nunes de Castro,et al.  Clustering algorithm selection by meta-learning systems: A new distance-based problem characterization and ranking combination methods , 2015, Inf. Sci..

[27]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[28]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[29]  Xiangyu Luo,et al.  Age estimation with dynamic age range , 2017, Multimedia Tools and Applications.

[30]  Han Qi,et al.  A new method to estimate ages of facial image for large database , 2015, Multimedia Tools and Applications.

[31]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[33]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[34]  Yee Leung,et al.  Clustering by Scale-Space Filtering , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Edwin Diday,et al.  A Recent Advance in Data Analysis: Clustering Objects into Classes Characterized by Conjunctive Concepts , 1981 .

[36]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[37]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[38]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  V. J. Rayward-Smith,et al.  Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition , 1999 .

[40]  Stan Lipovetsky,et al.  Dimensionality reduction for data of unknown cluster structure , 2016, Inf. Sci..

[41]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[42]  L. Breiman,et al.  Variable Kernel Estimates of Multivariate Densities , 1977 .

[43]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[44]  Raymond Y. K. Lau,et al.  Time series k-means: A new k-means type smooth subspace clustering for time series data , 2016, Inf. Sci..

[45]  Sam Kwong,et al.  Efficient Motion and Disparity Estimation Optimization for Low Complexity Multiview Video Coding , 2015, IEEE Transactions on Broadcasting.

[46]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..