Forum latent Dirichlet allocation for user interest discovery

The popularity of online forums provides a good opportunity to learn user interests which can be used in many business scenarios, such as product or news recommendation. There exist many approaches to infer forum topics and users interests. Among them, Author-Topic (AT) like models are most popular. But a thread in online forum is composed of a root post and some response posts which may be relevant or irrelevant to the root post. So the assumption of AT that response posts are generated from users interest topics is not comprehensive. In this paper, we distinguish users serious and unserious interest topics and argue that the topic of a relevant response post is jointly determined by its authors serious interest topics and the topics of its root post, while the topic of irrelevant response post is only determined by its authors unserious interest topics. Based on these assumptions, we propose Forum-LDA to model the generative process of root post, relevant and irrelevant response posts jointly. Therefore, our model can not only learn more coherent topics and serious interests, but also identify unserious users who publish many irrelevant posts. Extensive experiments on real forum dataset demonstrate the advantages of our model in tasks such as user interest and unserious user discovery.

[1]  Eugene Agichtein,et al.  TM-LDA: efficient online modeling of latent topic transitions in social media , 2012, KDD.

[2]  Yunming Ye,et al.  The Author-Topic-Community model for author interest profiling and community discovery , 2014, Knowledge and Information Systems.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Wray L. Buntine,et al.  Twitter-Network Topic Model: A Full Bayesian Treatment for Social Network and Text Modeling , 2016, ArXiv.

[5]  Hua Xu,et al.  Implicit feature identification in Chinese reviews using explicit topic mining model , 2015, Knowl. Based Syst..

[6]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[7]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[8]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[9]  Yang Liu,et al.  User Participation Prediction in Online Forums , 2012, EACL.

[10]  Chun Chen,et al.  Modeling Dynamic Multi-Topic Discussions in Online Forums , 2010, AAAI.

[11]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[12]  Peng Zhang,et al.  Group-based Latent Dirichlet Allocation (Group-LDA): Effective audience detection for books in online social media , 2016, Knowl. Based Syst..

[13]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  I-Han Hsiao,et al.  Topic facet modeling: semantic visual analytics for online discussion forums , 2015, LAK.

[15]  Xiao Pu,et al.  Wiki-LDA: A Mixed-Method Approach for Effective Interest Mining on Twitter Data , 2016, CSEDU.

[16]  Qing Yang,et al.  Discovering User Interest on Twitter with a Modified Author-Topic Model , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[17]  Lise Getoor,et al.  Understanding MOOC Discussion Forums using Seeded LDA , 2014, BEA@ACL.

[18]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[19]  Yang Liu,et al.  Summarizing web forum threads based on a latent topic propagation process , 2011, CIKM '11.

[20]  Victor Cheng,et al.  Linked Topic and Interest Model for Web Forums , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[21]  Alice H. Oh,et al.  Aspect and sentiment unification model for online review analysis , 2011, WSDM '11.

[22]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[23]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[24]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[25]  Wei Gao,et al.  Topic Extraction from Microblog Posts Using Conversation Structures , 2016, ACL.

[26]  Ying Ding,et al.  A Unified Topic-Style Model for Online Discussions , 2014 .

[27]  Shimei Pan,et al.  TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis , 2012, TIST.