Prior-Based Dual Additive Latent Dirichlet Allocation for User-Item Connected Documents

User-item connected documents, such as customer reviews for specific items in online shopping website and user tips in location-based social networks, have become more and more prevalent recently. Inferring the topic distributions of user-item connected documents is beneficial for many applications, including document classification and summarization of users and items. While many different topic models have been proposed for modeling multiple text, most of them cannot account for the dual role of user-item connected documents (each document is related to one user and one item simultaneously) in topic distribution generation process. In this paper, we propose a novel probabilistic topic model called Prior-based Dual Additive Latent Dirichlet Allocation (PDA-LDA). It addresses the dual role of each document by associating its Dirichlet prior for topic distribution with user and item topic factors, which leads to a document-level asymmetric Dirichlet prior. In the experiments, we evaluate PDA-LDA on several real datasets and the results demonstrate that our model is effective in comparison to several other models, including held-out perplexity on modeling text and document classification application.

[1]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[2]  Lei Yang,et al.  We know what @you #tag: does the dual role affect hashtag adoption? , 2012, WWW.

[3]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[4]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[5]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[6]  Hui Xiong,et al.  Learning geographical preferences for point-of-interest recommendation , 2013, KDD.

[7]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[8]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[9]  Regina Barzilay,et al.  Content Models with Attitude , 2011, ACL.

[10]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[11]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[12]  Jianyong Wang,et al.  A dirichlet multinomial mixture model-based approach for short text clustering , 2014, KDD.

[13]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[14]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Subhabrata Mukherjee,et al.  Joint Author Sentiment Topic Model , 2014, SDM.

[17]  Martin Ester,et al.  ILDA: interdependent LDA model for learning latent aspects and their ratings from online product reviews , 2011, SIGIR.

[18]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[19]  Hongfei Yan,et al.  Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid , 2010, EMNLP.

[20]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[21]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[22]  Fei Xu,et al.  Dual role model for question recommendation in community question answering , 2012, SIGIR '12.