Generating summary keywords for emails using topics

Email summary keywords, used to concisely represent the gist of an email, can help users manage and prioritize large numbers of messages. We develop an unsupervised learning framework for selecting summary keywords from emails using latent representations of the underlying topics in a user's mailbox. This approach selects words that describe each message in the context of existing topics rather than simply selecting keywords based on a single message in isolation. We present and compare four methods for selecting summary keywords based on two well-known models for inferring latent topics: latent semantic analysis and latent Dirichlet allocation. The quality of the summary keywords is assessed by generating summaries for emails from twelve users in the Enron corpus. The summary keywords are then used in place of entire messages in two proxy tasks: automated foldering and recipient prediction. We also evaluate the extent to which summary keywords enhance the information already available in a typical email user interface by repeating the same tasks using email subject lines.

[1]  Yoram Singer,et al.  Online multiclass learning by interclass hypothesis sharing , 2006, ICML.

[2]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..

[3]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[4]  Stephen Wan,et al.  Generating Overview Summaries of Ongoing Email Thread Discussions , 2004, COLING.

[5]  Joshua Goodman,et al.  Implicit Queries for Email , 2005, CEAS.

[6]  Owen Rambow,et al.  Summarizing Email Threads , 2004, NAACL.

[7]  Lawrence Birnbaum,et al.  TagAssist: Automatic Tag Suggestion for Blog Posts , 2007, ICWSM.

[8]  Giuseppe Carenini,et al.  Summarizing email conversations with clue words , 2007, WWW '07.

[9]  Carman Neustaedter,et al.  The Social Network and Relationship Finder: Social Sorting for Email Triage , 2005, CEAS.

[10]  Andrew McCallum,et al.  Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora , 2005 .

[11]  William W. Cohen,et al.  Recommending Recipients in the Enron Email Corpus , 1972 .

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Mark Dredze,et al.  Automatically classifying emails into activities , 2006, IUI '06.

[14]  Jeffrey O. Kephart,et al.  MailCat: an intelligent assistant for organizing e-mail , 1999, AGENTS '99.

[15]  Susan T. Dumais,et al.  LSI meets TREC: A Status Report , 1992, TREC.

[16]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[17]  Andrew McCallum,et al.  A Note on Topical N-grams , 2005 .

[18]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[19]  Yorick Wilks,et al.  FASIL Email Summarisation System , 2004, COLING.

[20]  Thomas L. Griffiths,et al.  A probabilistic approach to semantic representation , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[21]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[22]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[23]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[24]  Christopher Joseph Pal CC Prediction with Graphical Models , 2006, CEAS.

[25]  Henry Tirri,et al.  A Scalable Topic-Based Open Source Search Engine , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[26]  Anoop Gupta,et al.  Supporting Email Workflow , 2001 .

[27]  Smaranda Muresan,et al.  Combining linguistic and machine learning techniques for email summarization , 2001, CoNLL.

[28]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[29]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..