Entity Identification in Documents Expressing Shared Relationships

This paper addresses the problem of entity identification in documents in which key identity attributes are missing. The most common approach is to take a single entity reference and determine the "best match" of its attributes to a set of candidate identities selected from an appropriate entity catalog. This paper describes a new technique of multiple-reference, shared-relationship identity resolution that can be employed when a document references several entities that share a specific relationship, a situation that often occurs in published documents. It also describes the results obtained from a recent test of the multiple-reference, shared-relationship identity resolution technique applied to obituary notices. The preliminary results show that the multiple-reference technique can provide higher quality identification results than single-reference matching in cases where a shared relationship is asserted.