Modeling individual email patterns over time with latent variable models

As digital communication devices play an increasingly prominent role in our daily lives, the ability to analyze and understand our communication patterns becomes more important. In this paper, we investigate a latent variable modeling approach for extracting information from individual email histories, focusing in particular on understanding how an individual communicates over time with recipients in their social network. The proposed model consists of latent groups of recipients, each of which is associated with a piecewise-constant Poisson rate over time. Inference of group memberships, temporal changepoints, and rate parameters is carried out via Markov Chain Monte Carlo (MCMC) methods. We illustrate the utility of the model by applying it to both simulated and real-world email data sets.

[1]  Sheldon M. Ross,et al.  Introduction to Probability Models (4th ed.). , 1990 .

[2]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[3]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[4]  Duncan J. Watts,et al.  Characterizing individual communication patterns , 2009, KDD.

[5]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[6]  Florian Windhager,et al.  How to analyze dynamic network patterns of high performing teams , 2010 .

[7]  S. L. Scott,et al.  The Markov Modulated Poisson Process and Markov Poisson Cascade with Applications to Web Traffic Modeling , 2003 .

[8]  Jeffrey Heer,et al.  Groups without tears: mining social topologies from email , 2011, IUI '11.

[9]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[10]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[11]  Adilson E. Motter,et al.  A Poissonian explanation for heavy tails in e-mail communication , 2008, Proceedings of the National Academy of Sciences.

[12]  Terrill L. Frantz,et al.  Communication Networks from the Enron Email Corpus “It's Always About the People. Enron is no Different” , 2005, Comput. Math. Organ. Theory.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  John C. Tang,et al.  Am I wasting my time organizing email?: a study of email refinding , 2011, CHI.

[15]  Danyel Fisher,et al.  Using egocentric networks to understand communication , 2005, IEEE Internet Computing.

[16]  Carter T. Butts,et al.  4. A Relational Event Framework for Social Action , 2008 .

[17]  Edo Liberty,et al.  Automatically tagging email by leveraging other users' folders , 2011, KDD.

[18]  S. Chib Estimation and comparison of multiple change-point models , 1998 .

[19]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[20]  Caroline Haythornthwaite,et al.  Studying Online Social Networks , 2006, J. Comput. Mediat. Commun..

[21]  Michael I. Jordan,et al.  A Sticky HDP-HMM With Application to Speaker Diarization , 2009, 0905.2592.

[22]  Sheldon M. Ross Introduction to Probability Models. , 1995 .

[23]  A. J. Bernheim Brush,et al.  Revisiting Whittaker & Sidner's "email overload" ten years later , 2006, CSCW '06.

[24]  Alfred O. Hero,et al.  Tracking Communities in Dynamic Social Networks , 2011, SBP.

[25]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[26]  Robert E. Kraut,et al.  Should I open this email?: inbox-level cues, curiosity and attention to email , 2011, CHI.

[27]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[28]  Wouter de Nooy,et al.  Networks of action and events over time. A multilevel discrete-time event history model for longitudinal network data , 2011, Soc. Networks.

[29]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[30]  J. Alison Bryant,et al.  IMing, Text Messaging, and Adolescent Social Networks , 2006, J. Comput. Mediat. Commun..

[31]  Yossi Matias,et al.  Suggesting friends using the implicit social graph , 2010, KDD.

[32]  Steven L. Scott,et al.  A Bayesian paradigm for designing intrusion detection systems , 2004, Computational Statistics & Data Analysis.

[33]  John Blitzer,et al.  Intelligent Email: Aiding Users with AI , 2008, AAAI.

[34]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[35]  Padhraic Smyth,et al.  Statistical Models for Exploring Individual Email Communication Behavior , 2012, ACML.

[36]  Tanya Y. Berger-Wolf,et al.  A framework for community identification in dynamic social networks , 2007, KDD '07.

[37]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.