Determine the Entity Number in Hierarchical Clustering for Web Personal Name Disambiguation

user is often frustrated by the ambiguous names in the web search results when the user is trying to find information about some person. Hierarchical clustering methods are often used to cluster the personal names referred to the same entities. As the correct number of the entities for a given personal name can not be accessed, we are required to determine the cut points in the dendrogram to gain high disambiguation accuracy. In this paper, we explore the appropriate cut points in hierarchical clustering for web personal name disambiguation. We first measure the similarity and density distribution of the search result pages, and then we propose an approach that combines the global distribution features and local features from cut points to explore the appropriate cut points. Finally, we perform experiments on real-world datasets and the results show that our method is effective.