AUG: A combined classification and clustering approach for web people disambiguation

This paper presents a combined supervised and unsupervised approach for multi-document person name disambiguation. Based on feature vectors reflecting pairwise comparisons between web pages, a classification algorithm provides linking information about document pairs, which leads to initial clusters. In addition, two different clustering algorithms are fed with matrices of weighted keywords. In a final step the "seed" clusters are combined with the results of the clustering algorithms. Results on the validation data show that a combined classification and clustering approach doesn't always compare favorably to those obtained by the different algorithms separately.