A feasible density peaks clustering algorithm with a merging strategy

Density peaks clustering (DPC) algorithm is a novel algorithm that efficiently deals with the complex structure of the data sets by finding the density peaks. It needs neither iterative process nor more parameters. The density–distance is utilized to find the density peaks in the DPC algorithm. But unfortunately, it will divide one cluster into multiple clusters if there are multiple density peaks in one cluster and ineffective when data sets have relatively higher dimensions. To overcome the first problem, we propose a FDPC algorithm based on a novel merging strategy motivated by support vector machine. First, the strategy utilizes the support vectors to calculate the feedback values between every two clusters after clustering based on the DPC. Then, it merges clusters to obtain accurate clustering results in a recursive way according to the feedback values. To address the second limitation, we introduce nonnegative matrix factorization into the FDPC to preprocess high-dimensional data sets before clustering. The experimental results on real-world data sets and artificial data sets demonstrate that our algorithm is robust and flexible and can recognize arbitrary shapes of the clusters effectively regardless of the space dimension and outperforms DPC.

[1]  Chuan Li,et al.  Highly efficient and exact method for parallelization of grid‐based algorithms and its implementation in DelPhi , 2012, J. Comput. Chem..

[2]  Pei Chen,et al.  Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering , 2016, Pattern Recognit. Lett..

[3]  George Trigeorgis,et al.  A Deep Matrix Factorization Method for Learning Attribute Representations , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Hongjie Jia,et al.  Self-Tuning p-Spectral Clustering Based on Shared Nearest Neighbors , 2015, Cognitive Computation.

[5]  Paul D. McNicholas,et al.  Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures , 2013, Comput. Stat. Data Anal..

[6]  Serge Guillaume,et al.  DENDIS: A new density-based sampling for clustering algorithm , 2016, Expert Syst. Appl..

[7]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  Hongjie Jia,et al.  Study on density peaks clustering based on k-nearest neighbors and principal component analysis , 2016, Knowl. Based Syst..

[9]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[10]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11]  Bin Gu,et al.  Incremental learning for ν-Support Vector Regression , 2015, Neural Networks.

[12]  Shifei Ding,et al.  Twin support vector machines based on fruit fly optimization algorithm , 2016, Int. J. Mach. Learn. Cybern..

[13]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[14]  Yifan Xu,et al.  Fast clustering using adaptive density peak detection , 2015, Statistical methods in medical research.

[15]  Xiao Xu,et al.  An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood , 2017, Knowl. Based Syst..

[16]  Carey E. Priebe,et al.  A Model Selection Approach for Clustering a Multinomial Sequence with Non-Negative Factorization , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  François-Benoît Vialatte,et al.  Alternative Techniques of Neural Signal Processing in Neuroengineering , 2015, Cognitive Computation.

[18]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[19]  Ge Yu,et al.  Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce , 2016, IEEE Trans. Knowl. Data Eng..

[20]  Lawrence R. Rabiner,et al.  Combinatorial optimization:Algorithms and complexity , 1984 .

[21]  Rongfang Bie,et al.  Clustering by fast search and find of density peaks via heat diffusion , 2016, Neurocomputing.

[22]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[23]  Weixin Xie,et al.  Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors , 2016, Inf. Sci..

[24]  Bin Gu,et al.  A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Yike Guo,et al.  Fast density clustering strategies based on the k-means algorithm , 2017, Pattern Recognit..

[26]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[27]  Vladimir Ceperic,et al.  Integration of Neural Network-Based Symbolic Regression in Deep Learning for Scientific Discovery , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[29]  Zhou Wang,et al.  Complex Wavelet Structural Similarity: A New Image Similarity Index , 2009, IEEE Transactions on Image Processing.

[30]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[31]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Yang Ma,et al.  Fuzzy nodes recognition based on spectral clustering in complex networks , 2017 .

[33]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.