E-mail categorization using partially related training examples

Automatic e-mail categorization with traditional classification methods requires labelling of training data. In a real-life setting, this labelling disturbs the working flow of the user. We argue that it might be helpful to use documents, which are generally well-structured in directories on the file system, as training data for supervised e-mail categorization and thereby reducing the labelling effort required from users. Previous work demonstrated that the characteristics of documents and e-mail messages are too different to use organized documents as training examples for e-mail categorization using traditional supervised classification methods. In this paper we present a novel network-based algorithm that is capable of taking into account these differences between documents and e-mails. With the network algorithm, it is possible to use documents as training material for e-mail categorization without user intervention. This way, the effort for the users for labeling training examples is reduced, while the organization of their information flow is still improved. The accuracy of the algorithm on categorizing e-mail messages was evaluated using a set of e-mail correspondence related to the documents. The proposed network method was significantly better than traditional text classification algorithm in this setting.

[1]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[2]  John Blitzer,et al.  Intelligent email: reply and attachment prediction , 2008, IUI '08.

[3]  D. Allen Getting Things Done: The Art of Stress-Free Productivity , 2001 .

[4]  Shawn F. Blau Getting Things Done: The Art of Stress-Free Productivity , 2001 .

[5]  Andrew Slater,et al.  The Learning Behind Gmail Priority Inbox , 2010 .

[6]  Ted Pedersen,et al.  SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts , 2005, ACL.

[7]  Jeffrey O. Kephart,et al.  MailCat: an intelligent assistant for organizing e-mail , 1999, AGENTS '99.

[8]  Alfred Krzywicki,et al.  Exploiting Concept Clumping for Efficient Incremental E-Mail Categorization , 2010, ADMA.

[9]  Yang Xiang,et al.  Managing email overload with an automatic nonparametric clustering system , 2009, The Journal of Supercomputing.

[10]  Edo Liberty,et al.  Automatically tagging email by leveraging other users' folders , 2011, KDD.

[11]  Wessel Kraaij,et al.  Using file system content to organize e-mail , 2012, IIiX.

[12]  Sharma Chakravarthy,et al.  A Graph-Based Approach for Multi-folder Email Classification , 2010, 2010 IEEE International Conference on Data Mining.

[13]  Wessel Kraaij,et al.  Term Extraction for User Profiling: Evaluation by the User , 2013, UMAP Workshops.

[14]  Marko Grobelnik,et al.  Using task context to achieve effective information delivery , 2009, CIAO '09.

[15]  Yan Liu,et al.  Learning with Minimum Supervision: A General Framework for Transductive Transfer Learning , 2011, 2011 IEEE 11th International Conference on Data Mining.

[16]  Candace L. Sidner,et al.  Email overload: exploring personal information management of email , 1996, CHI.

[17]  Stan Matwin,et al.  Email classification with co-training , 2011, CASCON.

[18]  Sun Park,et al.  Automatic E-mail Classification Using Dynamic Category Hierarchy and Semantic Features , 2010 .

[19]  Andrew McCallum,et al.  Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora , 2005 .

[20]  Paul Warren,et al.  Personal Information Management: The Case for an Evolutionary Approach , 2014, Interact. Comput..

[21]  Wessel Kraaij,et al.  Combining textual and non-textual features for e-mail importance estimation , 2013 .

[22]  Alfred Krzywicki,et al.  Exploiting Concept Clumping for Efficient Incremental News Article Categorization , 2011, ADMA.

[23]  Tom M. Mitchell,et al.  Exploring Hierarchical User Feedback in Email Clustering , 2008 .

[24]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[25]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[26]  S. Grossberg,et al.  Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors , 1976, Biological Cybernetics.