An Optimal Density Peak Algorithm Based on Data Field and Information Entropy

The manual selection of threshold dc and cluster centers are the big limitations of the clustering by fast search and find of density peaks algorithm (DPC). In this paper, the data field theory was introduced to adaptively select the threshold dc. The selection of cluster centers was regarded as the segmentation of the dataset that contains positive and negative samples. The information entropy was calculated to measure the purity of the dataset after each segmentation. Segmentation with the maximum entropy reduction divided all the cluster centers into the positive sample set, thus avoiding the errors cause by operating the Decision Graph manually. Comparison of clustering results on the artificial datasets shows that the proposed method (F_DPC) can choose the threshold dc reasonably according to the distribution of datasets and is robust to noise. Evaluating indicators in the real-life experiments exhibit that the proposed method can accurately determine the number of clusters and gain higher clustering accuracy.