Annotation and Classification of an Email Importance Corpus

This paper presents an email importance corpus annotated through Amazon Mechanical Turk (AMT). Annotators annotate the email content type and email importance for three levels of hierarchy (senior manager, middle manager and employee). Each email is annotated by 5 turkers. Agreement study shows that the agreed AMT annotations are close to the expert annotations. The annotated dataset demonstrates difference in proportions of content type between different levels. An email importance prediction system is trained on the dataset and identifies the unimportant emails at minimum 0.55 precision with only text-based features.

[1]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[4]  Alexander I. Rudnicky,et al.  Using the Amazon Mechanical Turk to Transcribe and Annotate Meeting Speech for Extractive Summarization , 2010, Mturk@HLT-NAACL.

[5]  Andrew Slater,et al.  The Learning Behind Gmail Priority Inbox , 2010 .

[6]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[7]  John Blitzer,et al.  Intelligent Email: Aiding Users with AI , 2008, AAAI.

[8]  Jade Goldstein-Stewart,et al.  Annotating Subsets of the Enron Email Corpus , 2006, CEAS.

[9]  Meliha Yetisgen-Yildiz,et al.  Annotating Large Email Datasets for Named Entity Recognition with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[10]  Gary Geunbae Lee,et al.  Semi-supervised Speech Act Recognition in Emails and Forums , 2009, EMNLP.

[11]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[12]  Tom M. Mitchell,et al.  Learning to Classify Email into “Speech Acts” , 2004, EMNLP.

[13]  Louise Guthrie,et al.  Towards the Orwellian Nightmare: Separation of Business and Personal Emails , 2006, ACL.

[14]  Cécile Paris,et al.  Detecting Emails Containing Requests for Action , 2010, NAACL.

[15]  Salvatore J. Stolfo,et al.  Segmentation and Automated Social Hierarchy Detection through Email Network Analysis , 2009, WebKDD/SNA-KDD.

[16]  Robert E. Kraut,et al.  Email overload at work: an analysis of factors associated with email strain , 2006, IEEE Engineering Management Review.

[17]  John Blitzer,et al.  Intelligent email: reply and attachment prediction , 2008, IUI '08.

[18]  Robert E. Kraut,et al.  Understanding email use: predicting action on a message , 2005, CHI.

[19]  Karl-Michael Schneider,et al.  A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering , 2003, EACL.