论文信息 - Structure in the Enron Email Dataset

Structure in the Enron Email Dataset

We investigate the structures present in the Enron email dataset using singular value decomposition and semidiscrete decomposition. Using word frequency profiles, we show that messages fall into two distinct groups, whose extrema are characterized by short messages and rare words versus long messages and common words. It is surprising that length of message and word use pattern should be related in this way. We also investigate relationships among individuals based on their patterns of word use in email. We show that word use is correlated to function within the organization, as expected. Lastly, we show that relative changes to individuals' word usage over time can be used to identify key players in major company events.

David B. Skillicorn | P. S. Keila | P. Keila | D. Skillicorn

[1] Peter Bruza,et al. Discovery of Implicit and Explicit Connections Between People Using Email Utterance , 2003, ECSCW.

[2] G KoldaTamara,et al. A semidiscrete matrix decomposition for latent semantic indexing information retrieval , 1998 .

[3] Gene H. Golub,et al. Matrix computations (3rd ed.) , 1996 .

[4] Nancy Spruill,et al. SECURITY SCREENING AND KNOWLEDGE MANAGEMENT IN THE DEPARTMENT OF DEFENSE , 2001 .

[5] Tamara G. Kolda,et al. Algorithm 805: computation and uses of the semidiscrete matrix decomposition , 2000, TOMS.

[6] David B. Skillicorn. Beyond Keyword Filtering for Message and Conversation Detection , 2005, ISI.

[7] David B. Skillicorn. Detecting Related Message Traffic , 2004 .

[8] Michael A. Xenos,et al. Dimensional Reduction of Word-Frequency Data as a Substitute for Intersubjective Content Analysis , 2004, Political Analysis.

[9] Jafar Adibi,et al. The Enron Email Dataset Database Schema and Brief Statistical Report , 2004 .

[10] Tamara G. Kolda,et al. A semidiscrete matrix decomposition for latent semantic indexing information retrieval , 1998, TOIS.

[11] David B. Skillicorn,et al. Semidiscrete Decomposition: A Bump Hunting Technique , 2002, AusDM.

[12] D. O’Leary,et al. Computation and Uses of the Semidiscrete Matrix Decomposition , 1999 .