Joint Latent Dirichlet Allocation for Social Tags

Social tags, serving as a textual source of simple but useful semantic metadata to reflect the user preference or describe the web objects, has been widely used in many applications. However, social tags have several unique characteristics, i.e., sparseness and data coupling (i.e., non-IIDness), which makes existing text analysis methods such as LDA not directly applicable. In this paper, we propose a new generative algorithm for social tag analysis named joint latent Dirichlet allocation, which models the generation of tags based on both the users and the objects, and thus accounts for the coupling relationships among social tags. The model introduces two latent factors that jointly influence tag generation: the user's latent interest factor and the object's latent topic factor, formulated as user-topic distribution matrix and object-topic distribution matrix, respectively. A Gibbs sampling approach is adopted to simultaneously infer the above two matrices as well as a topic-word distribution matrix. Experimental results on four social tagging datasets have shown that our model is able to capture more reasonable topics and achieves better performance than five state-of-the-art topic models in terms of the widely used point-wise mutual information metric. In addition, we analyze the learnt topics showing that our model recovers more themes from social tags while LDA may lead the topic vanishing problems, and demonstrate its advantages in the social recommendation by evaluating the retrieval results with mean reciprocal rank metric. Finally, we explore the joint procedure of our model in depth to show the non-IID characteristic of social tagging process.

[1]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[2]  Changsheng Xu,et al.  Multi-Modal Event Topic Model for Social Event Analysis , 2016, IEEE Transactions on Multimedia.

[3]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[4]  Jing Liu,et al.  Personalized Geo-Specific Tag Recommendation for Photos on Social Websites , 2014, IEEE Transactions on Multimedia.

[5]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[6]  Yang Song,et al.  Automatic tag recommendation algorithms for social recommender systems , 2011, ACM Trans. Web.

[7]  M. Shamim Hossain,et al.  Cross-Platform Multi-Modal Topic Modeling for Personalized Inter-Platform Recommendation , 2015, IEEE Transactions on Multimedia.

[8]  Kristen Grauman,et al.  Reading between the lines: Object localization using implicit cues from image tags , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  A. Erdélyi,et al.  The asymptotic expansion of a ratio of gamma functions. , 1951 .

[10]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[11]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[13]  Chong Wang,et al.  Nested Hierarchical Dirichlet Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Yuqing Mao,et al.  A Probabilistic Topic Model with Social Tags for Query Reformulation in Informational Search , 2011, ADMA.

[15]  Xuelong Li,et al.  Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search , 2013, IEEE Transactions on Image Processing.

[16]  Changsheng Xu,et al.  User-Aware Image Tag Refinement via Ternary Semantic Analysis , 2012, IEEE Transactions on Multimedia.

[17]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[18]  Sadegh Aliakbary,et al.  Web Page Classification Using Social Tags , 2009, 2009 International Conference on Computational Science and Engineering.

[19]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[20]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[21]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[22]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[23]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[24]  Lei Wu,et al.  Tag Completion for Image Retrieval , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[26]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[27]  Hao Wang,et al.  Recommending Flickr groups with social topic model , 2012, Information Retrieval.

[28]  Hagai Attias,et al.  Topic regression multi-modal Latent Dirichlet Allocation for image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  Longbing Cao,et al.  Social Image Analysis From a Non-IID Perspective , 2014, IEEE Transactions on Multimedia.

[32]  Vladimir Batagelj,et al.  Generalized Cores , 2002, ArXiv.

[33]  Rui Li,et al.  Exploring social tagging graph for web object classification , 2009, KDD.

[34]  Zhe Gan,et al.  Scalable Deep Poisson Factor Analysis for Topic Modeling , 2015, ICML.

[35]  Rajarshi Das,et al.  Gaussian LDA for Topic Models with Word Embeddings , 2015, ACL.

[36]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Fei Wang,et al.  ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback , 2012, AAAI.

[38]  Edwin V. Bonilla,et al.  Improving Topic Coherence with Regularized Topic Models , 2011, NIPS.

[39]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[40]  Xiaoyong Du,et al.  Tag Features for Geo-Aware Image Classification , 2015, IEEE Transactions on Multimedia.

[41]  Greg Mori,et al.  A Max-Margin Riffled Independence Model for Image Tag Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Geert-Jan Houben,et al.  Cross-system user modeling and personalization on the Social Web , 2013, User Modeling and User-Adapted Interaction.

[43]  Tao Xiang,et al.  Learning Multimodal Latent Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Fei-Fei Li,et al.  Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  David B. Dunson,et al.  Beta-Negative Binomial Process and Poisson Factor Analysis , 2011, AISTATS.

[46]  Ivor W. Tsang,et al.  Tag-Based Image Retrieval Improved by Augmented Features and Group-Based Refinement , 2012, IEEE Transactions on Multimedia.

[47]  Fei-Fei Li,et al.  Image Segmentation with Topic Random Field , 2010, ECCV.

[48]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[49]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[50]  Mark B. Sandler,et al.  Music Information Retrieval Using Social Tags and Audio , 2009, IEEE Transactions on Multimedia.

[51]  Daniel Gatica-Perez,et al.  Modeling Flickr Communities Through Probabilistic Topic-Based Analysis , 2010, IEEE Transactions on Multimedia.

[52]  Hsuan-Tien Lin,et al.  Unsupervised Semantic Feature Discovery for Image Object Retrieval and Tag Refinement , 2012, IEEE Transactions on Multimedia.