Reply With: Proactive Recommendation of Email Attachments

Email responses often contain items---such as a file or a hyperlink to an external document---that are attached to or included inline in the body of the message. Analysis of an enterprise email corpus reveals that 35% of the time when users include these items as part of their response, the attachable item is already present in their inbox or sent folder. A modern email client can proactively retrieve relevant attachable items from the user's past emails based on the context of the current conversation, and recommend them for inclusion, to reduce the time and effort involved in composing the response. In this paper, we propose a weakly supervised learning framework for recommending attachable items to the user. As email search systems are commonly available, we constrain the recommendation task to formulating effective search queries from the context of the conversations. The query is submitted to an existing IR system to retrieve relevant items for attachment. We also present a novel strategy for generating labels from an email corpus---without the need for manual annotations---that can be used to train and evaluate the query formulation model. In addition, we describe a deep convolutional neural network that demonstrates satisfactory performance on this query formulation task when evaluated on the publicly available Avocado dataset and a proprietary dataset of internal emails obtained through an employee participation program.

[1]  Milad Shokouhi,et al.  From Queries to Cards: Re-ranking Proactive Card Recommendations Based on Reactive Search History , 2015, SIGIR.

[2]  Elizabeth D. Mynatt,et al.  Designing audio aura , 1998, CHI.

[3]  David van Dijk,et al.  Recipient recommendation in enterprises using communication graphs and email content , 2014, SIGIR.

[4]  Manish Gupta,et al.  Information Retrieval with Verbose Queries , 2015, Found. Trends Inf. Retr..

[5]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[6]  Le Zhao,et al.  Term necessity prediction , 2010, CIKM.

[7]  W. Bruce Croft,et al.  Improving verbose queries using subset distribution , 2010, CIKM.

[8]  W. Bruce Croft,et al.  Transforming patents into prior-art queries , 2009, SIGIR.

[9]  Thad Starner,et al.  Remembrance Agent: A Continuously Running Automated Information Retrieval System , 1996, PAAM.

[10]  Susan T. Dumais,et al.  Characterizing Email Search using Large-scale Behavioral Logs and Surveys , 2017, WWW.

[11]  Yang Song,et al.  Query-Less: Predicting Task Repetition for NextGen Proactive Search and Recommendation Engines , 2016, WWW.

[12]  Jimmy J. Lin,et al.  Pseudo test collections for learning web search ranking functions , 2011, SIGIR.

[13]  Maarten de Rijke,et al.  Using Contextual Information to Improve Search in Email Archives , 2009, ECIR.

[14]  M. de Rijke,et al.  Learning Semantic Query Suggestions , 2009, SEMWEB.

[15]  Allan Hanbury,et al.  Patent Retrieval , 2013, Found. Trends Inf. Retr..

[16]  W. Bruce Croft,et al.  Evaluating verbose query processing techniques , 2010, SIGIR.

[17]  Mostafa Keikha,et al.  Building Queries for Prior-Art Search , 2011, IRFC.

[18]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[19]  M. de Rijke,et al.  Pyndri: A Python Interface to the Indri Search Engine , 2017, ECIR.

[20]  Michael Gamon,et al.  Activity Modeling in Email , 2016, NAACL.

[21]  Scott Sanner,et al.  On Term Selection Techniques for Patent Prior Art Search , 2015, SIGIR.

[22]  Bradley J. Rhodes,et al.  The wearable remembrance agent: A system for augmented memory , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[23]  James Allan,et al.  Frontiers, challenges, and opportunities for information retrieval: Report from SWIRL 2012 the second strategic workshop on information retrieval in Lorne , 2012, SIGF.

[24]  Falk Scholer,et al.  Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence , 2008, ECIR.

[25]  Paul P. Maglio,et al.  SUITOR: an attentive information system , 2000, IUI '00.

[26]  John Blitzer,et al.  "Sorry, I Forgot the Attachment": Email Attachment Prediction , 2006, CEAS.

[27]  David R. Morse,et al.  Enhanced Reality Fieldwork: the Context Aware Archaeological Assistant , 1997 .

[28]  Bradley J. Rhodes,et al.  Margin notes: building a contextually aware associative memory , 2000, IUI '00.

[29]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[30]  Pu-Jen Cheng,et al.  A term dependency-based approach for query terms ranking , 2009, CIKM.

[31]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[32]  Craig MacDonald,et al.  Usefulness of quality click-through data for training , 2009, WSCD '09.

[33]  W. Bruce Croft,et al.  Automatic query generation for patent search , 2009, CIKM.

[34]  Pattie Maes,et al.  Just-in-time information retrieval agents , 2000, IBM Syst. J..

[35]  Kristian J. Hammond,et al.  Watson: Anticipating and Contextualizing Information Needs , 1999 .

[36]  M. de Rijke,et al.  Pseudo test collections for training and tuning microblog rankers , 2013, SIGIR.

[37]  Hang Li,et al.  Improving quality of training data for learning to rank using click-through data , 2010, WSDM '10.

[38]  Candace L. Sidner,et al.  Email overload: exploring personal information management of email , 1996, CHI.

[39]  Luo Si,et al.  Effective query generation and postprocessing strategies for prior art patent search , 2012, J. Assoc. Inf. Sci. Technol..

[40]  William W. Cohen,et al.  On the collective classification of email "speech acts" , 2005, SIGIR '05.

[41]  Manish Gupta,et al.  Information Retrieval with Verbose Queries , 2015, Found. Trends Inf. Retr..

[42]  Peter Young,et al.  Smart Reply: Automated Response Suggestion for Email , 2016, KDD.

[43]  Dotan Di Castro,et al.  You've got Mail, and Here is What you Could do With It!: Analyzing and Predicting Actions on Email Messages , 2016, WSDM.

[44]  Eric Gilbert,et al.  Overload is overloaded: email in the age of Gmail , 2014, CHI.

[45]  Chris Schmandt,et al.  Nomadic radio: speech and audio interaction for contextual messaging in nomadic environments , 2000, TCHI.

[46]  Peter E. Hart,et al.  Query-Free Information Retrieval , 1997, IEEE Expert.

[47]  James Allan,et al.  Frontiers, Challenges, and Opportunities for Information Retrieval , 2012 .

[48]  W. Bruce Croft,et al.  Automatic boolean query suggestion for professional search , 2011, SIGIR.

[49]  Niranjan Balasubramanian,et al.  Exploring reductions for long web queries , 2010, SIGIR.

[50]  Maarten de Rijke,et al.  Dynamic Query Modeling for Related Content Finding , 2015, SIGIR.

[51]  Iadh Ounis,et al.  Inferring Query Performance Using Pre-retrieval Predictors , 2004, SPIRE.

[52]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[53]  D. Sculley,et al.  Large Scale Learning to Rank , 2009 .

[54]  Shawn A. Weil,et al.  New Approaches to Overcoming E-Mail Overload , 2004 .

[55]  W. Bruce Croft,et al.  Compact query term selection using topically related text , 2013, SIGIR.

[56]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[57]  Eric Horvitz,et al.  Principles of mixed-initiative user interfaces , 1999, CHI '99.

[58]  Vitor R. Carvalho,et al.  Reducing long queries using query quality predictors , 2009, SIGIR.

[59]  W. Bruce Croft,et al.  Analysis of long queries in a large scale search log , 2009, WSCD '09.

[60]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[61]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[62]  John Blitzer,et al.  Intelligent email: reply and attachment prediction , 2008, IUI '08.

[63]  Krisztian Balog,et al.  Anticipating Information Needs Based on Check-in Activity , 2017, WSDM.

[64]  Ryen W. White,et al.  Anticipatory search: using context to initiate search , 2012, SIGIR '12.

[65]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[66]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[67]  Marcus Thint,et al.  Adaptive personal agents , 1998, Personal Technologies.

[68]  Edward A. Fox,et al.  Automatic query formulations in information retrieval , 1983, J. Am. Soc. Inf. Sci..