The Joint Inference of Topic Diffusion and Evolution in Social Communities

The prevalence of Web 2.0 techniques has led to the boom of various online communities, where topics spread ubiquitously among user-generated documents. Working together with this diffusion process is the evolution of topic content, where novel contents are introduced by documents which adopt the topic. Unlike explicit user behavior (e.g., buying a DVD), both the diffusion paths and the evolutionary process of a topic are implicit, making their discovery challenging. In this paper, we track the evolution of an arbitrary topic and reveal the latent diffusion paths of that topic in a social community. A novel and principled probabilistic model is proposed which casts our task as an joint inference problem, which considers textual documents, social influences, and topic evolution in a unified way. Specifically, a mixture model is introduced to model the generation of text according to the diffusion and the evolution of the topic, while the whole diffusion process is regularized with user-level social influences through a Gaussian Markov Random Field. Experiments on both synthetic data and real world data show that the discovery of topic diffusion and evolution benefits from this joint inference, and the probabilistic model we propose performs significantly better than existing methods.

[1]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[2]  Deng Cai,et al.  Probabilistic dyadic data analysis with local and global consistency , 2009, ICML '09.

[3]  Masahiro Kimura,et al.  Selecting Information Diffusion Models over Social Networks for Behavioral Analysis , 2010, ECML/PKDD.

[4]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[5]  Haewoon Kwak,et al.  Finding influentials based on the temporal order of information adoption in twitter , 2010, WWW '10.

[6]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[7]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[8]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[11]  Xiaojun Wan,et al.  Learning information diffusion process on the web , 2007, WWW '07.

[12]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[13]  Michael H. MacRoberts,et al.  Problems of citation analysis , 1996, Scientometrics.

[14]  Jiawei Han,et al.  Modeling hidden topics on document manifold , 2008, CIKM '08.

[15]  Bo Zhao,et al.  PET: a statistical model for popular events tracking in social communities , 2010, KDD.

[16]  Michael H. MacRoberts,et al.  Problems of citation analysis , 1992, Scientometrics.

[17]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[18]  Christos Faloutsos,et al.  Patterns of Cascading Behavior in Large Blog Graphs , 2007, SDM.

[19]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[20]  Masahiro Kimura,et al.  Minimizing the Spread of Contamination by Blocking Links in a Network , 2008, AAAI.

[21]  Masahiro Kimura,et al.  Extracting influential nodes on a social network for information diffusion , 2009, Data Mining and Knowledge Discovery.

[22]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[23]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[24]  Laks V. S. Lakshmanan,et al.  Learning influence probabilities in social networks , 2010, WSDM '10.

[25]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[26]  Feng Qiu,et al.  Automatic identification of user interest for personalized search , 2006, WWW '06.

[27]  Masahiro Kimura,et al.  Learning to Predict Opinion Share in Social Networks , 2010, AAAI.

[28]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[29]  Yizhou Sun,et al.  iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[30]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[31]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[32]  Rynson W. H. Lau,et al.  CHECK: a document plagiarism detection system , 1997, SAC '97.

[33]  Harold W. Sorenson,et al.  Parameter estimation: Principles and problems , 1980 .

[34]  Jure Leskovec,et al.  Modeling Information Diffusion in Implicit Networks , 2010, 2010 IEEE International Conference on Data Mining.