Statistical Models for Exploring Individual Email Communication Behavior

As digital communication devices play an increasingly prominent role in our daily lives, the ability to analyze and understand our communication patterns becomes more important. In this paper, we investigate a latent variable modeling approach for extracting information from individual email histories, focusing in particular on understanding how an individual communicates over time with recipients in their social network. The proposed model consists of latent groups of recipients, each of which is associated with a piecewise-constant Poisson rate over time. Inference of group memberships, temporal changepoints, and rate parameters is carried out via Markov Chain Monte Carlo (MCMC) methods. We illustrate the utility of the model by applying it to both simulated and real-world email data sets.

[1]  Carter T. Butts,et al.  4. A Relational Event Framework for Social Action , 2008 .

[2]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[3]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[4]  Duncan J. Watts,et al.  Characterizing individual communication patterns , 2009, KDD.

[5]  S. Chib Estimation and comparison of multiple change-point models , 1998 .

[6]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Alfred O. Hero,et al.  Tracking Communities in Dynamic Social Networks , 2011, SBP.

[9]  Yossi Matias,et al.  Suggesting friends using the implicit social graph , 2010, KDD.

[10]  Steven L. Scott,et al.  A Bayesian paradigm for designing intrusion detection systems , 2004, Computational Statistics & Data Analysis.

[11]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[12]  Tanya Y. Berger-Wolf,et al.  A framework for community identification in dynamic social networks , 2007, KDD '07.

[13]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[14]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[15]  Adilson E. Motter,et al.  A Poissonian explanation for heavy tails in e-mail communication , 2008, Proceedings of the National Academy of Sciences.

[16]  D. Heckerman,et al.  The Markov Modulated Poisson Process and Markov Poisson Cascade with Applications to Web Traffic Modeling , 2002 .

[17]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[18]  Michael I. Jordan,et al.  A Sticky HDP-HMM With Application to Speaker Diarization , 2009, 0905.2592.

[19]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[20]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.