Social network analysis has emerged as a key technique in countering crime and terrorism. The Enron e-mail dataset, originally made public and posted to the web by the Federal Energy Regulatory Commission during its investigation, consists of around half a million e-mails among several thousand individuals. It is valuable in the sense that it is perhaps the only real e-mail dataset that is accessible to the research community. This paper presents preliminary results of an analysis of the Enron e-mail dataset based on a variation of the Author-Recipient-Topic (ART) model [1]. The GR-ART model described here uses grammatical relations as features, rather than bags of words. It is our hypothesis that using grammatical relations as features will provide a more useful model of authors, topics, and recipients than will the use of words alone. This research complements earlier research by one of the authors in applying information extraction techniques to cross-document named entity co-reference [2].
[1]
Andrew McCallum,et al.
The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email
,
2005
.
[2]
Paul Thompson,et al.
Names: A New Frontier in Text Mining
,
2003,
ISI.
[3]
Andrew B. Whinston,et al.
Intelligence and Security Informatics: An Information Economics Perspective
,
2003,
ISI.
[4]
Thomas L. Griffiths,et al.
Probabilistic author-topic models for information discovery
,
2004,
KDD.
[5]
Michael I. Jordan,et al.
Latent Dirichlet Allocation
,
2001,
J. Mach. Learn. Res..
[6]
Ted Briscoe,et al.
Parser evaluation: a survey and a new proposal
,
1998,
LREC.