A novel density peak clustering algorithm based on squared residual error

The density peak clustering (DPC) algorithm is designed to quickly identify intricate-shaped clusters with high dimensionality by finding high-density peaks in a non-iterative manner and using only one threshold parameter. However, DPC has certain limitations in processing low-density data points because it only takes the global data density distribution into account. As such, DPC may confine in forming low-density data clusters, or in other words, DPC may fail in detecting anomalies and borderline points. In this paper, we analyze the limitations of DPC and propose a novel density peak clustering algorithm to better handle low-density clustering tasks. Specifically, our algorithm provides a better decision graph comparing to DPC for the determination of cluster centroids. Experimental results show that our algorithm outperforms DPC and other clustering algorithms on the benchmarking datasets.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Rongfang Bie,et al.  Clustering by fast search and find of density peaks via heat diffusion , 2016, Neurocomputing.

[3]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[4]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[5]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[6]  Mengmeng Wang,et al.  An improved density peaks-based clustering method for social circle discovery in social networks , 2016, Neurocomputing.

[7]  Ujjwal Maulik,et al.  A Survey of Multiobjective Evolutionary Algorithms for Data Mining: Part I , 2014, IEEE Transactions on Evolutionary Computation.

[8]  Di Wang,et al.  Bank failure prediction using an accurate and interpretable neural fuzzy inference system , 2016, AI Commun..

[9]  Yi Peng,et al.  Evaluation of clustering algorithms for financial risk analysis using MCDM methods , 2014, Inf. Sci..

[10]  Ah-Hwee Tan,et al.  Self-regulated incremental clustering with focused preferences , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[11]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[12]  Xinge You,et al.  A Batch Rival Penalized Expectation-Maximization Algorithm for Gaussian Mixture Clustering with Automatic Model Selection , 2012, Comput. Math. Methods Medicine.

[13]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[14]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[15]  Laurence T. Yang,et al.  Data Mining for Internet of Things: A Survey , 2014, IEEE Communications Surveys & Tutorials.

[16]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[17]  Tao Chen,et al.  Model-based multidimensional clustering of categorical data , 2012, Artif. Intell..

[18]  Ujjwal Maulik,et al.  Survey of Multiobjective Evolutionary Algorithms for Data Mining: Part II , 2014, IEEE Transactions on Evolutionary Computation.

[19]  Di Wang,et al.  Ovarian cancer diagnosis using a hybrid intelligent system with simple yet convincing rules , 2014, Appl. Soft Comput..

[20]  Ankit Chaudhary,et al.  Intelligent Approaches to interact with Machines using Hand Gesture Recognition in Natural way: A Survey , 2011, ArXiv.

[21]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[22]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..