Cross document person name disambiguation using entity profiles

Given an ambiguous person name as input, a crossdocument person name disambiguation system clusters documents so that each cluster contains all and only those documents referring to the same person. In this paper we present our approach to this task. We introduce novel features based on topic models and also document-level entity profiles—sets of information that are collected for each ambiguous person in the entire document. We also introduce a modified term frequency-inverse document frequency (TF-IDF) weighting scheme to represent entities in a vector-space model (VSM). Disambiguation is then performed via single-link hierarchical agglomerative clustering. Experiments show that an average F-measure of 94.03% is achieved using our proposed enhanced VSM model. This is an improvement over previous best results on the same test corpora.