A number of research and software development groups have developed name identification technology, but few have addressed the issue of cross-document coreference, or identifying the same named entities across documents. In a collection of documents, where there are multiple discourse contexts, there exists a many-to-many correspondence between names and entities, making it a challenge to automatically map them correctly. Recently, Bagga and Baldwin proposed a method for determining whether two names refer to the same entity by measuring the similarity between the document contexts in which they appear. Inspired by their approach, we have revisited our current cross-document coreference heuristics that make relatively simple decisions based on matching strings and entity types. We have devised an improved and promising algorithm, which we discuss in this paper.
[1]
W. Bruce Croft,et al.
An Association Thesaurus for Information Retrieval
,
1994,
RIAO.
[2]
Nina Wacholder,et al.
Extracting Names from Natural-Language Text
,
2000
.
[3]
Dragomir R. Radev,et al.
Building a Generation Knowledge Source using Internet-Accessible Newswire
,
1997,
ANLP.
[4]
Nina Wacholder,et al.
Disambiguation of Proper Names in Text
,
1997,
ANLP.
[5]
Breck Baldwin,et al.
Entity-Based Cross-Document Coreferencing Using the Vector Space Model
,
1998,
COLING.
[6]
James W. Cooper,et al.
Lexical navigation: visually prompted query expansion and refinement
,
1997,
DL '97.