Modeling Intention in Email - Speech Acts, Information Leaks and Recommendation Models

Everyday more than half of American adult internet users read or write email messages at least once. The prevalence of email has significantly impacted the working world, functioning as a great asset on many levels, yet at times, a costly liability. In an effort to improve various aspects of work-related communication, this work applies sophisticated machine learning techniques to a large body of email data. Several effective models are proposed that can aid with the prioritization of incoming messages, help with coordination of shared tasks, improve tracking of deadlines, and prevent disastrous information leaks. Carvalho presents many data-driven techniques that can positively impact work-related email communication and offers robust models that may be successfully applied to future machine learning tasks.

[1]  E. Maier,et al.  Dialogue Acts in VERBMOBIL , 1995 .

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  David R. Traum,et al.  Utterance Units in Spoken Dialogue , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.

[4]  Mark Dredze,et al.  Automatically classifying emails into activities , 2006, IUI '06.

[5]  Gordon V. Cormack,et al.  Online supervised spam filter evaluation , 2007, TOIS.

[6]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[7]  Koby Crammer,et al.  Robust Support Vector Machine Training via Convex Outlier Ablation , 2006, AAAI.

[8]  J. Sadock Speech acts , 2007 .

[9]  Lorrie Faith Cranor,et al.  Protecting people from phishing: the design and evaluation of an embedded training email system , 2007, CHI.

[10]  Jade Goldstein-Stewart,et al.  Using Speech Acts to Categorize Email and Identify Email Genres , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[11]  William W. Cohen,et al.  A Meta-Learning Approach for Robust Rank Learning , 2008 .

[12]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[13]  Andrew McCallum,et al.  The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email , 2005 .

[14]  Falk Scholer,et al.  User performance versus precision measures for simple search tasks , 2006, SIGIR.

[15]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[16]  Aravind K. Joshi,et al.  Ranking and Reranking with Perceptron , 2005, Machine Learning.

[17]  Paul P. Maglio,et al.  Expertise identification using email communications , 2003, CIKM '03.

[18]  William W. Cohen,et al.  Improving “Email Speech Acts” Analysis via N-gram Selection , 2006, HLT-NAACL 2006.

[19]  Masao Takaku Comparing System Evaluation with User Experiments for Japanese Web Navigational Retrieval , 2007 .

[20]  Susan R. Fussell,et al.  Coordination in Teams: Evidence from a Simulated Management Game , 2005 .

[21]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[22]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  William W. Cohen,et al.  Preventing Information Leaks in Email , 2007, SDM.

[25]  Tom M. Mitchell,et al.  Learning to Classify Email into “Speech Acts” , 2004, EMNLP.

[26]  Andrei Mikheev,et al.  Tagging Sentence Boundaries , 2000, ANLP.

[27]  David Hawking,et al.  Overview of the TREC 2004 Web Track , 2004, TREC.

[28]  Christopher Joseph Pal CC Prediction with Graphical Models , 2006, CEAS.

[29]  Stan Matwin,et al.  PEEP- Privacy Enforcement in Email Project , 2005, PST.

[30]  William W. Cohen,et al.  Discovering Leadership Roles in Email Workgroups , 2007, CEAS.

[31]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[33]  Shuang-Hong Yang,et al.  A Stagewise Least Square Loss Function for Classification , 2008, SDM.

[34]  Paul N. Bennett,et al.  Detecting action-items in e-mail , 2005, SIGIR '05.

[35]  Cécile Paris,et al.  Classifying Speech Acts using Verbal Response Modes , 2006, ALTA.

[36]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[37]  William W. Cohen,et al.  CutOnce-Recipient Recommendation and Leak Detection in Action , 2008 .

[38]  M. Schoop A Language-Action Approach to Electronic Negotiations , 2005 .

[39]  Jan Alexandersson,et al.  Towards a Decent Recognition Rate for the Automatic Classification of a Multidimensional Dialogue Act Tagset , 2005 .

[40]  William W. Cohen,et al.  Single-pass online learning: performance, voting schemes and online feature selection , 2006, KDD '06.

[41]  Alon Lavie,et al.  Input Segmentation of Spontaneous Speech in JANUS: A Speech-to-speech Translation System , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.

[42]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[43]  Robert E. Kraut,et al.  Information and Communication: Alternative Uses of the Internet in Households , 1999, Inf. Syst. Res..

[44]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[45]  William W. Cohen,et al.  Learning to Extract Signature and Reply Lines from Email , 2004, CEAS.

[46]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[47]  Chetan Kalyan,et al.  Information leak detection in financial e-mails using mail pattern analysis under partial information , 2007 .

[48]  Elizabeth Shriberg,et al.  Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual , 1997 .

[49]  Marti A. Hearst,et al.  Adaptive Sentence Boundary Disambiguation , 1994, ANLP.

[50]  Terry Winograd,et al.  Understanding computers and cognition , 1986 .

[51]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[52]  Akira Shimazu,et al.  Construction of Deliberation Structure in E‐Mail Communication , 2000, Comput. Intell..

[53]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[54]  Stan Matwin,et al.  PEEP- An Information Extraction base approach for Privacy Protection in Email , 2005, CEAS.

[55]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[56]  Steve Whittaker,et al.  Introduction to This Special Issue on Revisiting and Reinventing E-Mail , 2005, Hum. Comput. Interact..

[57]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[58]  Anton Leuski Email is a stage: discovering people roles from email archives , 2004, SIGIR '04.

[59]  Alon Lavie,et al.  Domain Specific Speech Acts for Spoken Language Translation , 2003, SIGDIAL Workshop.

[60]  David R. Traum,et al.  20 Questions on Dialogue Act Taxonomies , 2000, J. Semant..

[61]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[62]  Edward Ivanovic,et al.  Dialogue Act Tagging for Instant Messaging Chat Sessions , 2005, ACL.

[63]  Christopher Meek,et al.  Challenges of the Email Domain for Text Classification , 2000, ICML.

[64]  Yi Zhang,et al.  Graph-based ranking algorithms for e-mail expertise analysis , 2003, DMKD '03.

[65]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[66]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[67]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[68]  Keith S. Donnellan,et al.  Language, Mind, and Knowledge , 1977 .

[69]  Heng Ji,et al.  Re-Ranking Algorithms for Name Tagging , 2006 .

[70]  Csr Young,et al.  How to Do Things With Words , 2009 .

[71]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[72]  Jianfeng Gao,et al.  Linear discriminant model for information retrieval , 2005, SIGIR '05.

[73]  Michael Gamon,et al.  Task-Focused Summarization of Email , 2004 .

[74]  J. Searle Expression and Meaning: A taxonomy of illocutionary acts , 1975 .

[75]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[76]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[77]  Jan Alexanderssony,et al.  Dialogue acts in VERBMOBIL-2 , 1997 .

[78]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[79]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[81]  P Taylor,et al.  Intonation and dialogue context as constraints for speech recognition , 1998 .

[82]  Robert E. Kraut,et al.  Understanding email use: predicting action on a message , 2005, CHI.

[83]  Roni Khardon,et al.  Noise Tolerant Variants of the Perceptron Algorithm , 2007, J. Mach. Learn. Res..

[84]  Terry Winograd,et al.  A language/action perspective on the design of cooperative work , 1986, CSCW '86.

[85]  Craig MacDonald,et al.  Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.

[86]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[87]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[88]  Nicholas Kushmerick,et al.  Email Task Management: An Iterative Relational Learning Approach , 2005, CEAS.

[89]  Rob Miller,et al.  Facemail: showing faces of recipients to prevent misdirected email , 2007, SOUPS '07.

[90]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[91]  Tao Qin,et al.  FRank: a ranking method with fidelity loss , 2007, SIGIR.

[92]  Fernando Pérez-Cruz,et al.  Empirical risk minimization for support vector classifiers , 2003, IEEE Trans. Neural Networks.

[93]  Maarten de Rijke,et al.  Finding experts and their eetails in e-mail corpora , 2006, WWW '06.

[94]  Joshua Goodman,et al.  Implicit Queries for Email , 2005, CEAS.

[95]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[96]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[97]  Khaled El Emam,et al.  Benchmarking Kappa: Interrater Agreement in Software Process Assessments , 1999, Empirical Software Engineering.

[98]  Milad Shokouhi,et al.  Using Clicks as Implicit Judgments: Expectations Versus Observations , 2008, ECIR.

[99]  Tom M. Mitchell,et al.  Inferring Ongoing Activities of Workstation Users by Clustering Email , 2004, CEAS.

[100]  José Carlos Brustoloni,et al.  Improving security decisions with polymorphic and audited dialogs , 2007, SOUPS '07.

[101]  William W. Cohen,et al.  On the collective classification of email "speech acts" , 2005, SIGIR '05.

[102]  Jaime G. Carbonell,et al.  Fast learning of document ranking functions with the committee perceptron , 2008, WSDM '08.

[103]  Jihie Kim,et al.  Learning to Detect Conversation Focus of Threaded Discussions , 2006, NAACL.

[104]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[105]  William W. Cohen,et al.  Ranking Users for Intelligent Message Addressing , 2008, ECIR.

[106]  Alon Lavie,et al.  A discourse coding scheme for conversational Spanish , 1998, ICSLP.

[107]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[108]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[109]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[110]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[111]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[112]  Mark Stevenson,et al.  Experiments on Sentence Boundary Detection , 2000, ANLP.

[113]  Lise Getoor,et al.  Relationship Identification for Social Network Discovery , 2007, AAAI.

[114]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[115]  E. Hovy,et al.  Mining and Assessing Discussions on the Web through Speech Act Analysis , 2006 .

[116]  ChengXiang Zhai,et al.  Probabilistic Models for Expert Finding , 2007, ECIR.

[117]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[118]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[119]  Robert E. Kraut,et al.  Email overload at work: an analysis of factors associated with email strain , 2006, IEEE Engineering Management Review.

[120]  Edward Ivanovic,et al.  Automatic Utterance Segmentation in Instant Messaging Dialogue , 2005, ALTA.

[121]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[122]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[123]  John C. Platt,et al.  Automatic Discovery of Personal Topics to Organize Email , 2005, CEAS.

[124]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[125]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[126]  David Maxwell Chickering,et al.  Here or there: preference judgments for relevance , 2008 .

[127]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[128]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[129]  David Shipley,et al.  Send: The Essential Guide to Email for Office and Home , 2007 .

[130]  Lori S. Levin,et al.  CLARITY: INFERRING DISCOURSE STRUCTURE FROM SPEECH , 2002 .