User Based Aggregation for Biterm Topic Model

Biterm Topic Model (BTM) is designed to model the generative process of the word co-occurrence patterns in short texts such as tweets. However, two aspects of BTM may restrict its performance: 1) user individualities are ignored to obtain the corpus level words co-occurrence patterns; and 2) the strong assumptions that two co-occurring words will be assigned the same topic label could not distinguish background words from topical words. In this paper, we propose Twitter-BTM model to address those issues by considering user level personalization in BTM. Firstly, we use user based biterms aggregation to learn user specific topic distribution. Secondly, each user’s preference between background words and topical words is estimated by incorporating a background topic. Experiments on a large-scale real-world Twitter dataset show that Twitter-BTM outperforms several stateof-the-art baselines.

[1]  David Buttler,et al.  Exploring Topic Coherence over Many Models and Many Topics , 2012, EMNLP.

[2]  Yalou Huang,et al.  Hashtag Graph Based Topic Model for Tweet Mining , 2014, 2014 IEEE International Conference on Data Mining.

[3]  Xuanjing Huang,et al.  Learning Topical Translation Model for Microblog Hashtag Suggestion , 2013, IJCAI.

[4]  Ee-Peng Lim,et al.  Finding Bursty Topics from Microblogs , 2012, ACL.

[5]  Tomohiro Yoshikawa,et al.  Online topic model for Twitter considering dynamics of user interests and topic trends , 2014, EMNLP.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[8]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[9]  Ruixuan Li,et al.  RankTopic: Ranking Based Topic Modeling , 2012, 2012 IEEE 12th International Conference on Data Mining.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[12]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[13]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[14]  Xuanjing Huang,et al.  Automatic Hashtag Recommendation for Microblogs using Topic-Specific Translation Model , 2012, COLING.

[15]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Gao Cong,et al.  Tagging Your Tweets: A Probabilistic Modeling of Hashtag Annotation in Twitter , 2014, CIKM.

[17]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[18]  Xiaohui Yan,et al.  A Probabilistic Model for Bursty Topic Discovery in Microblogs , 2015, AAAI.

[19]  Wesley De Neve,et al.  Using topic models for Twitter hashtag recommendation , 2013, WWW.

[20]  Jiafeng Guo,et al.  BTM: Topic Modeling over Short Texts , 2014, IEEE Transactions on Knowledge and Data Engineering.

[21]  Xiaotie Deng,et al.  Exploiting Topic based Twitter Sentiment for Stock Prediction , 2013, ACL.

[22]  Yuexin Wu,et al.  We know what you want to buy: a demographic-based system for product recommendation on microblogs , 2014, KDD.