Modeling topic and community structure in social tagging: The TTR-LDA-Community model

The presence of social networks in complex systems has made networks and community structure a focal point of study in many domains. Previous studies have focused on the structural emergence and growth of communities and on the topics displayed within the network. However, few scholars have closely examined the relationship between the thematic and structural properties of networks. Therefore, this article proposes the Tagger Tag Resource-Latent Dirichlet Allocation-Community model (TTR-LDA-Community model), which combines the Latent Dirichlet Allocation (LDA) model with the Girvan-Newman community detection algorithm through an inference mechanism. Using social tagging data from Delicious, this article demonstrates the clustering of active taggers into communities, the topic distributions within communities, and the ranking of taggers, tags, and resources within these communities. The data analysis evaluates patterns in community structure and topical affiliations diachronically. The article evaluates the effectiveness of community detection and the inference mechanism embedded in the model and finds that the TTR-LDA-Community model outperforms other traditional models in tag prediction. This has implications for scholars in domains interested in community detection, profiling, and recommender systems. © 2011 Wiley Periodicals, Inc.

[1]  Ruoming Jin,et al.  A Topic Modeling Approach and Its Integration into the Random Walk Framework for Academic Search , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[3]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[6]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Nadia Creignou,et al.  Phase Transition for Random Quantified XOR-Formulas , 2007, J. Artif. Intell. Res..

[8]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[9]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[11]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[12]  Maosong Sun,et al.  Tag-LDA for Scalable Real-time Tag Recommendation , 2009 .

[13]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[14]  Fan Chung Graham,et al.  Local Partitioning for Directed Graphs Using PageRank , 2007, Internet Math..

[15]  Andrew McCallum,et al.  The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email , 2005 .

[16]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[17]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[19]  Bing He,et al.  Community-based topic modeling for social tagging , 2010, CIKM.

[20]  Zhoujun Li,et al.  The topic-perspective model for social tagging systems , 2010, KDD.

[21]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[22]  Ramesh Nallapati,et al.  Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs , 2021, ICWSM.

[23]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[24]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[25]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[26]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.