Combining textual and non-textual features for e-mail importance estimation

In this work, we present a binary classification problem in which we aim to identify those email messages that the receiver will reply to. The future goal is to develop a tool that informs a knowledge worker which emails are likely to need a reply. The Enron corpus was used to extract training examples. We analysed the word n-grams that characterize the messages that the receiver replies to. Additionally, we compare a Naive Bayes classifier to a decision tree classifier in the task of distinguishing replied from non-replied e-mails. We found that textual features are well-suited for obtaining high accuracy. However, there are interesting differences between recall and precision for the various feature selections.

[1]  John C. Tang,et al.  When Can I Expect an Email Response? A Study of Rhythms in Email Usage , 2003, ECSCW.

[2]  Andrew Slater,et al.  The Learning Behind Gmail Priority Inbox , 2010 .

[3]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[4]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[5]  Robert E. Kraut,et al.  Understanding email use: predicting action on a message , 2005, CHI.

[6]  Shikun Zhou,et al.  Applying machine learning techniques for e-mail management: solution with intelligent e-mail reply prediction , 2009 .

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  John Blitzer,et al.  Reply Expectation Prediction for Email Management , 2005, CEAS.

[9]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[10]  John Blitzer,et al.  Intelligent email: reply and attachment prediction , 2008, IUI '08.

[11]  Hans van Halteren,et al.  Linguistic Profiling for Authorship Recognition and Verification , 2004, ACL.

[12]  Ee-Peng Lim,et al.  Mining Interaction Behaviors for Email Reply Order Prediction , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[13]  Laura A. Dabbish,et al.  To reply or not to reply : Predicting action on an email message , 2004 .

[14]  Carman Neustaedter,et al.  Understanding sequence and reply relationships within email conversations: a mixed-model visualization , 2003, CHI '03.

[15]  Wessel Kraaij,et al.  Unobtrusively Measuring Stress and Workload of Knowledge Workers , 2012 .

[16]  Andrew McCallum,et al.  Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora , 2005 .

[17]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .