The manual selection of threshold dc and cluster centers are the big limitations of the clustering by fast search and find of density peaks algorithm (DPC). In this paper, the data field theory was introduced to adaptively select the threshold dc. The selection of cluster centers was regarded as the segmentation of the dataset that contains positive and negative samples. The information entropy was calculated to measure the purity of the dataset after each segmentation. Segmentation with the maximum entropy reduction divided all the cluster centers into the positive sample set, thus avoiding the errors cause by operating the Decision Graph manually. Comparison of clustering results on the artificial datasets shows that the proposed method (F_DPC) can choose the threshold dc reasonably according to the distribution of datasets and is robust to noise. Evaluating indicators in the real-life experiments exhibit that the proposed method can accurately determine the number of clusters and gain higher clustering accuracy.
[1]
Shuliang Wang,et al.
Data Field for Hierarchical Clustering
,
2011,
Int. J. Data Warehous. Min..
[2]
Qiu Yu.
The Application of Entropy in the Segmentation of Data Set
,
2007
.
[3]
Gan Wen-yan.
An Hierarchical Clustering Method Based on Data Fields
,
2006
.
[4]
Shuliang Wang,et al.
Clustering by Fast Search and Find of Density Peaks with Data Field
,
2016
.
[5]
Tao Ma,et al.
A new important-place identification method
,
2015,
2015 IEEE International Conference on Computer and Communications (ICCC).
[6]
John R. Anderson,et al.
MACHINE LEARNING An Artificial Intelligence Approach
,
2009
.
[7]
D. Anderson.
Information Theory and Entropy
,
2008
.
[8]
Sean Hughes,et al.
Clustering by Fast Search and Find of Density Peaks
,
2016
.