A Dynamic Nonparametric Model for Characterizing the Topical Communities in Social Streams

Latent variable models have proven to be a useful tool for discovering latent structures from observational data. However, the data in social networks often come as streams, i.e., both text content (e.g., emails, user postings) and network structure (e.g., user friendship) evolve over time. To capture the time-evolving latent structures in such social streams, we propose a fully nonparametric Dynamic Topical Community Model (nDTCM), where infinite latent community variables coupled with infinite latent topic variables in each epoch, and the temporal dependencies between variables across epochs are modeled via the rich-gets-richer scheme. We focus on characterizing three dynamic aspects in social streams: the number of communities or topics changes (e.g., new communities or topics are born and old ones die out); the popularity of communities or topics evolves; the semantics such as community topic distribution, community participant distribution and topic word distribution drift. Furthermore, we develop an effective online posterior inference algorithm for nDTCM, which is concordant with the online nature of social streams. Experiments using real-world data show the effectiveness of our model at discovering the dynamic topical communities in social streams.

[1]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[2]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[3]  Jafar Adibi,et al.  The Enron Email Dataset Database Schema and Brief Statistical Report , 2004 .

[4]  Eric P. Xing,et al.  Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream , 2010, UAI.

[5]  A. Banerjee,et al.  Social Topic Models for Community Extraction , 2008 .

[6]  L. Venkata Subramaniam,et al.  Using content and interactions for discovering communities in social networks , 2012, WWW.

[7]  Jian Liu,et al.  Fuzzy modularity and fuzzy community structure in networks , 2010 .

[8]  Andrew McCallum,et al.  Joint Group and Topic Discovery from Relations and Text , 2006, SNA@ICML.

[9]  Hongxia Jin,et al.  Community discovery and profiling with social messages , 2012, KDD.

[10]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[11]  John Yen,et al.  Probabilistic Community Discovery Using Hierarchical Latent Gaussian Mixture Model , 2007, AAAI.

[12]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[13]  Samuel J. Gershman,et al.  A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[14]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[15]  David B. Dunson,et al.  The dynamic hierarchical Dirichlet process , 2008, ICML '08.

[16]  Bo Zhao,et al.  Community evolution detection in dynamic heterogeneous information networks , 2010, MLG '10.

[17]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[18]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[19]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[20]  Alexander J. Smola,et al.  Online Inference for the Infinite Topic-Cluster Model: Storylines from Streaming Text , 2011, AISTATS.

[21]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[22]  Le Song,et al.  Dynamic mixed membership blockmodel for evolving networks , 2009, ICML '09.

[23]  Eric P. Xing,et al.  Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering , 2008, SDM.

[24]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[25]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[26]  Wei Li,et al.  Nonparametric Bayes Pachinko Allocation , 2007, UAI.

[27]  Yasushi Sakurai,et al.  Online multiscale dynamic topic models , 2010, KDD.

[28]  CaoLiangliang,et al.  Latent Community Topic Analysis , 2012 .

[29]  Peter I. Frazier,et al.  Distance dependent Chinese restaurant processes , 2009, ICML.

[30]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[31]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.