Inferring the Diffusion and Evolution of Topics in Social Communities

The prevailing of Web 2.0 techniques has led to the boom of various online communities, where topics are spreading ubiquitously among user-generated documents. Together with this diffusion process is the content evolution of the topics, where novel contents are introduced in by documents which adopt the topic. Unlike an explicit user behavior (e.g., buying a DVD), both the diffusion paths and the evolutionary process of a topic are implicit, making them much more challenging to be discovered. In this paper, we aim to simultaneously track the evolution of any arbitrary topic and reveal the latent diffusion paths of that topic in a social community. A novel and principled probabilistic model is proposed which casts our task as an joint inference problem, taking into consideration of textual documents, social influences, and topic evolution in a unified way. Specifically, a mixture model is introduced to model the generation of text according to the diffusion and the evolution of the topic, while the whole diffusion process is regularized with user-level social influences through a Gaussian Markov Random Field. Experiments on both synthetic data and real world data show that the discovery of topic diffusion and evolution benefits from this joint inference; and the probabilistic model we propose performs significantly better than existing methods.

[1]  Christos Faloutsos,et al.  Patterns of Cascading Behavior in Large Blog Graphs , 2007, SDM.

[2]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[3]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Masahiro Kimura,et al.  Minimizing the Spread of Contamination by Blocking Links in a Network , 2008, AAAI.

[6]  Jimeng Sun,et al.  Social influence analysis in large-scale networks , 2009, KDD.

[7]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[8]  Masahiro Kimura,et al.  Learning to Predict Opinion Share in Social Networks , 2010, AAAI.

[9]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[10]  Bo Zhao,et al.  PET: a statistical model for popular events tracking in social communities , 2010, KDD.

[11]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[12]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[13]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[14]  Laks V. S. Lakshmanan,et al.  Learning influence probabilities in social networks , 2010, WSDM '10.

[15]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[16]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[17]  Michael H. MacRoberts,et al.  Problems of citation analysis , 1992, Scientometrics.

[18]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[19]  Deng Cai,et al.  Gaussian Mixture Model with Local Consistency , 2010, AAAI.

[20]  Haewoon Kwak,et al.  Finding influentials based on the temporal order of information adoption in twitter , 2010, WWW '10.

[21]  Xiaojun Wan,et al.  Learning information diffusion process on the web , 2007, WWW '07.

[22]  Deng Cai,et al.  Probabilistic dyadic data analysis with local and global consistency , 2009, ICML '09.

[23]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[24]  Masahiro Kimura,et al.  Extracting Influential Nodes for Information Diffusion on a Social Network , 2007, AAAI.

[25]  Masahiro Kimura,et al.  Selecting Information Diffusion Models over Social Networks for Behavioral Analysis , 2010, ECML/PKDD.

[26]  Rynson W. H. Lau,et al.  CHECK: a document plagiarism detection system , 1997, SAC '97.

[27]  Jiawei Han,et al.  Mining topic-level influence in heterogeneous networks , 2010, CIKM.

[28]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[29]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[30]  Jon M. Kleinberg,et al.  Tracing information flow on a global scale using Internet chain-letter data , 2008, Proceedings of the National Academy of Sciences.

[31]  Jure Leskovec,et al.  Modeling Information Diffusion in Implicit Networks , 2010, 2010 IEEE International Conference on Data Mining.

[32]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[33]  Yizhou Sun,et al.  iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[34]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[35]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[36]  Harold W. Sorenson,et al.  Parameter estimation: Principles and problems , 1980 .

[37]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[38]  Jiawei Han,et al.  Modeling hidden topics on document manifold , 2008, CIKM '08.

[39]  Sean Gerrish,et al.  A Language-based Approach to Measuring Scholarly Impact , 2010, ICML.

[40]  Feng Qiu,et al.  Automatic identification of user interest for personalized search , 2006, WWW '06.

[41]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[42]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[43]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[44]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.