Natural neighbor-based clustering algorithm with local representatives

Clustering by identifying cluster centers is important for detecting patterns in a data set. However, many center-based clustering algorithms cannot process data sets containing non-spherical clusters. In this paper, we propose a novel clustering algorithm called NaNLORE based on natural neighbor and local representatives. Natural neighbor is a new neighbor concept and introduced to compute local density and find local representatives which are points with local maximum density. We first find local representatives and then select cluster centers from the local representatives. The density-adaptive distance is introduced to measure the distance between local representatives, which helps to solve the problem of clustering data sets with complex manifold structure. Cluster centers are characterized by higher density than their neighbors and a relatively large density-adaptive distance from any local representatives with higher density. In experiments, we compare the proposed algorithm NaNLORE with existing algorithms on synthetic and real data sets. Results show that NaNLORE performs better than existing algorithm, especially on clustering non-spherical data and manifold data.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  YangShanlin,et al.  Exploring the uniform effect of FCM clustering , 2016 .

[4]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[5]  Qinbao Song,et al.  Automatic Clustering via Outward Statistical Testing on Density Metrics , 2016, IEEE Transactions on Knowledge and Data Engineering.

[6]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[7]  Ji Feng,et al.  Natural neighbor: A self-adaptive neighborhood method without parameter K , 2016, Pattern Recognit. Lett..

[8]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9]  Y. Jiang,et al.  Spectral Clustering on Multiple Manifolds , 2011, IEEE Transactions on Neural Networks.

[10]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[11]  Shanlin Yang,et al.  Fuzziness parameter selection in fuzzy c-means: The perspective of cluster validation , 2014, Science China Information Sciences.

[12]  Ji Feng,et al.  A non-parameter outlier detection algorithm based on Natural Neighbor , 2016, Knowl. Based Syst..

[13]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[14]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[15]  Benjamin King Step-Wise Clustering Procedures , 1967 .

[16]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[17]  Yi Li,et al.  Boosting the K-Nearest-Neighborhood based incremental collaborative filtering , 2013, Knowl. Based Syst..

[18]  ChenPei,et al.  Delta-density based clustering with a divide-and-conquer strategy , 2016 .

[19]  Bernhard Schölkopf,et al.  A Local Learning Approach for Clustering , 2006, NIPS.

[20]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[21]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[22]  Pei Chen,et al.  Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering , 2016, Pattern Recognit. Lett..

[23]  Qingsheng Zhu,et al.  Spectral clustering with density sensitive similarity function , 2011, Knowl. Based Syst..

[24]  Jong-Seok Lee,et al.  Robust outlier detection using the instability factor , 2014, Knowl. Based Syst..

[25]  Leandro Nunes de Castro,et al.  Clustering algorithm selection by meta-learning systems: A new distance-based problem characterization and ranking combination methods , 2015, Inf. Sci..

[26]  Cor J. Veenman,et al.  A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Shanlin Yang,et al.  Exploring the uniform effect of FCM clustering: A data distribution perspective , 2016, Knowl. Based Syst..

[28]  Xiangliang Zhang,et al.  K-AP: Generating Specified K Clusters by Efficient Affinity Propagation , 2010, 2010 IEEE International Conference on Data Mining.

[29]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[30]  Hongjie Jia,et al.  A density-adaptive affinity propagation clustering algorithm based on spectral dimension reduction , 2014, Neural Computing and Applications.

[31]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[32]  Pasi Fränti,et al.  Iterative shrinking method for clustering problems , 2006, Pattern Recognit..

[33]  FerrariDaniel Gomes,et al.  Clustering algorithm selection by meta-learning systems , 2015 .

[34]  Qingsheng Zhu,et al.  Adaptive edited natural neighbor algorithm , 2017, Neurocomputing.

[35]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[36]  Gerhard X. Ritter,et al.  A simple statistics-based nearest neighbor cluster detection algorithm , 2015, Pattern Recognit..

[37]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[38]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[39]  Fan Yang,et al.  A power efficient 1.0625-3.125 Gb/s serial transceiver in 130 nm digital CMOS for multi-standard applications , 2013, Science China Information Sciences.

[40]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[41]  Longbing Cao,et al.  A novel graph-based k-means for nonlinear manifold clustering and representative selection , 2014, Neurocomputing.

[42]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..