Of Information Systems School of Information Systems 11-2014 On Joint Modeling of Topical Communities and Personal Interest in Microblogs

In this paper, we propose the Topical Communities and Personal Interest (TCPI) model for simultaneously modeling topics, topical communities, and users’ topical interests in microblogging data. TCPI considers different topical communities while differentiating users’ personal topical interests from those of topical communities, and learning the dependence of each user on the affiliated communities to generate content. This makes TCPI different from existing models that either do not consider the existence of multiple topical communities, or do not differentiate between personal and community’s topical interests. Our experiments on two Twitter datasets show that TCPI can effectively mine the representative topics for each topical community. We also demonstrate that TCPI significantly outperforms other state-of-the-art topic models in the modeling tweet generation task.

[1]  Jure Leskovec,et al.  Community-Affiliation Graph Model for Overlapping Network Community Detection , 2012, 2012 IEEE 12th International Conference on Data Mining.

[2]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[3]  Krishna P. Gummadi,et al.  The Emergence of Conventions in Online Social Networks , 2012, ICWSM.

[4]  Feida Zhu,et al.  It Is Not Just What We Say, But How We Say Them: LDA-based Behavior-Topic Model , 2013, SDM.

[5]  Jiawei Han,et al.  Latent Community Topic Analysis: Integration of Community Discovery with Topic Modeling , 2012, TIST.

[6]  Ying Ding,et al.  Community detection: Topological vs. topical , 2011, J. Informetrics.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Timothy W. Finin,et al.  Why We Twitter: An Analysis of a Microblogging Community , 2009, WebKDD/SNA-KDD.

[9]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[10]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[11]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[12]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[14]  Eric P. Xing,et al.  Spatial compactness meets topical consistency: jointly modeling links and content for community detection , 2014, WSDM.

[15]  Pengtao Xie,et al.  Integrating Document Clustering and Topic Modeling , 2013, UAI.

[16]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[17]  William W. Cohen,et al.  Regularization of Latent Variable Models to Obtain Sparsity , 2013, SDM.

[18]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[19]  Ee-Peng Lim,et al.  On Modeling Community Behaviors and Sentiments in Microblogging , 2014, SDM.

[20]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[21]  Jure Leskovec,et al.  Detecting cohesive and 2-mode communities indirected and undirected networks , 2014, WSDM.

[22]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[23]  John Yen,et al.  Advances in Web Mining and Web Usage Analysis, 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Philadelphia, PA, USA, August 20, 2006, Revised Papers , 2007, WebKDD.

[24]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[25]  Liangjie Hong,et al.  A time-dependent topic model for multiple text streams , 2011, KDD.

[26]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[27]  Scott Sanner,et al.  Improving LDA topic models for microblogs via tweet pooling and automatic labeling , 2013, SIGIR.

[28]  Mary Beth Rosson,et al.  How and why people Twitter: the role that micro-blogging plays in informal communication at work , 2009, GROUP.

[29]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[30]  Krishna P. Gummadi,et al.  Predicting emerging social conventions in online social networks , 2012, CIKM.

[31]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[32]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[33]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[34]  William W. Cohen,et al.  From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering , 2013, ECML/PKDD.

[35]  Víctor M. Eguíluz,et al.  Distinguishing topical and social groups based on common identity and bond theory , 2013, WSDM.