Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email

Previous work in social network analysis (SNA) has modeled the existence of links from one entity to another, but not the attributes such as language content or topics on those links. We present the Author-Recipient-Topic (ART) model for social network analysis, which learns topic distributions based on the direction-sensitive messages sent between entities. The model builds on Latent Dirichlet Allocation (LDA) and the Author-Topic (AT) model, adding the key attribute that distribution over topics is conditioned distinctly on both the sender and recipient--steering the discovery of topics according to the relationships between people. We give results on both the Enron email corpus and a researcher's email archive, providing evidence not only that clearly relevant topics are discovered, but that the ART model better predicts people's roles and gives lower perplexity on previously unseen messages. We also present the Role-Author-Recipient-Topic (RART) model, an extension to ART that explicitly represents people's roles.

[1]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[3]  Duncan J. Watts,et al.  Six Degrees: The Science of a Connected Age , 2003 .

[4]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[5]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jafar Adibi,et al.  The Enron Email Dataset Database Schema and Brief Statistical Report , 2004 .

[7]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[8]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[9]  Andrew McCallum,et al.  Group and Topic Discovery from Relations and Their Attributes , 2005, NIPS.

[10]  Yiming Yang,et al.  Stochastic link and group detection , 2002, AAAI/IAAI.

[11]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[12]  Lada A. Adamic,et al.  How to search a social network , 2005, Soc. Networks.

[13]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[14]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[15]  M E J Newman,et al.  Identity and Search in Social Networks , 2002, Science.

[16]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[17]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[18]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[19]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[20]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[21]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[22]  Andrew McCallum,et al.  Expertise modeling for matching papers with reviewers , 2007, KDD '07.

[23]  Jon M. Kleinberg,et al.  Navigation in a small world , 2000, Nature.

[24]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[25]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[26]  Kenichi Kurihara,et al.  A Frequency-based Stochastic Blockmodel , 2006 .

[27]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[28]  Thomas L. Griffiths,et al.  Discovering Latent Classes in Relational Data , 2004 .

[29]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[30]  Lada A. Adamic,et al.  Information flow in social groups , 2003, cond-mat/0305305.

[31]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[32]  H. White,et al.  STRUCTURAL EQUIVALENCE OF INDIVIDUALS IN SOCIAL NETWORKS , 1977 .

[33]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.