Latent Community Topic Analysis: Integration of Community Discovery with Topic Modeling

This article studies the problem of latent community topic analysis in text-associated graphs. With the development of social media, a lot of user-generated content is available with user networks. Along with rich information in networks, user graphs can be extended with text information associated with nodes. Topic modeling is a classic problem in text mining and it is interesting to discover the latent topics in text-associated graphs. Different from traditional topic modeling methods considering links, we incorporate community discovery into topic analysis in text-associated graphs to guarantee the topical coherence in the communities so that users in the same community are closely linked to each other and share common latent topics. We handle topic modeling and community discovery in the same framework. In our model we separate the concepts of community and topic, so one community can correspond to multiple topics and multiple communities can share the same topic. We compare different methods and perform extensive experiments on two real datasets. The results confirm our hypothesis that topics could help understand community structure, while community structure could help model topics.

[1]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Andrew McCallum,et al.  Group and Topic Discovery from Relations and Their Attributes , 2005, NIPS.

[4]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[5]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[6]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[7]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Philip S. Yu,et al.  A probabilistic framework for relational clustering , 2007, KDD '07.

[13]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[14]  Jiawei Han,et al.  Modeling hidden topics on document manifold , 2008, CIKM '08.

[15]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[16]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[17]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[18]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[19]  David M. Blei,et al.  Introduction to Probabilistic Topic Models , 2010 .

[20]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[22]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[24]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[25]  Srinivasan Parthasarathy,et al.  Community Discovery in Social Networks: Applications, Methods and Emerging Trends , 2011, Social Network Data Analytics.

[26]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[27]  Huan Liu,et al.  Community Detection and Mining in Social Media , 2010, Community Detection and Mining in Social Media.

[28]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[29]  A. Banerjee,et al.  Social Topic Models for Community Extraction , 2008 .

[30]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[31]  Ramesh Nallapati,et al.  Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs , 2021, ICWSM.

[32]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[33]  Yizhou Sun,et al.  Heterogeneous source consensus learning via decision propagation and negotiation , 2009, KDD.

[34]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[35]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[36]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[37]  John Yen,et al.  An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks , 2007, 2007 IEEE Intelligence and Security Informatics.

[38]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[39]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[40]  Yong Yu,et al.  A topical link model for community discovery in textual interaction graph , 2010, CIKM.

[41]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..

[42]  John Yen,et al.  Probabilistic Community Discovery Using Hierarchical Latent Gaussian Mixture Model , 2007, AAAI.

[43]  Yong Yu,et al.  Mining topics on participations for community discovery , 2011, SIGIR.

[44]  Yizhou Sun,et al.  iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[45]  Weixiong Zhang,et al.  An Efficient Spectral Algorithm for Network Community Discovery and Its Applications to Biological and Social Networks , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[46]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .