Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks

Email communication plays an integral part of everybody’s life nowadays. Especially for business emails, extracting and analysing these communication networks can reveal interesting patterns of processes and decision making within a company. Fraud detection is another application area where precise detection of communication networks is essential. In this paper we present an approach based on recurrent neural networks to untangle email threads originating from forward and reply behaviour. We further classify parts of emails into 2 or 5 zones to capture not only header and body information but also greetings and signatures. We show that our deep learning approach outperforms state-of-the-art systems based on traditional machine learning and hand-crafted rules. Besides using the well-known Enron email corpus for our experiments, we additionally created a new annotated email benchmark corpus from Apache mailing lists.

[1]  Susan T. Dumais,et al.  Characterizing and Predicting Enterprise Email Reply Behavior , 2017, SIGIR.

[2]  Siegfried Handschuh,et al.  Classifying Action Items for Semantic Email , 2010, LREC.

[3]  Carolyn Penstein Rosé,et al.  Recovering Implicit Thread Structure in Newsgroup Style Conversations , 2021, ICWSM.

[4]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[5]  Aristides Gionis,et al.  Social Network Analysis and Mining for Business Applications , 2011, TIST.

[6]  Dominique Estival,et al.  Author Profiling for English and Arabic Emails , 2008 .

[7]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[8]  Iryna Gurevych,et al.  Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads , 2013, RANLP.

[9]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[10]  Ben Shneiderman,et al.  Beyond Threads: Identifying Discussions in Email Archives , 2005 .

[11]  Nada Matta,et al.  Context Aware Knowledge Zoning: Traceability and Business Emails , 2015, AI4KM@IJCAI.

[12]  William W. Cohen,et al.  Learning to Extract Signature and Reply Lines from Email , 2004, CEAS.

[13]  Cécile Paris,et al.  Segmenting Email Message Text into Zones , 2009, EMNLP.

[14]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[15]  Liyana Shuib,et al.  Email Classification Research Trends: Review and Open Issues , 2017, IEEE Access.

[16]  Shafiq R. Joty,et al.  Topic Segmentation and Labeling in Asynchronous Conversations , 2013, J. Artif. Intell. Res..

[17]  Maarten de Rijke,et al.  Thread Reconstruction in Conversational Data using Neural Coherence Models , 2017, ArXiv.

[18]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[19]  Cécile Paris,et al.  Detecting Emails Containing Requests for Action , 2010, NAACL.

[20]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.