Modeling Identity in Archival Collections of Email: A Preliminary Study

Access to historically significant email archives poses challenges that arise less often in personal collections. Most notably, searchers may need help making sense of the identities, roles, and relationships of individuals that participated in archived email exchanges. This paper describes an exploratory study of identity resolution in the public subset of the Enron collection. Addressname and address-address associations in explicit, embedded and implied email headers are augmented with name and nickname associations discovered from consistent use in salutations and signatures. Limited transitive closure heuristics are employed to extend pair-wise associations to richer representations of identity. Assessment of sampled results indicates that many potentially useful nontrivial associations can be detected.