A sparse topic model for bursty topic discovery in social networks

Bursty topic discovery aims to automatically identify bursty events and continuously keep track of known events. The existing methods focus on the topic model. However, the sparsity of short text brings the challenge to the traditional topic models because the words are too few to learn from the original corpus. To tackle this problem, we propose a Sparse Topic Model (STM) for bursty topic discovery. First, we distinguish the modeling between the bursty topic and the common topic to detect the change of the words in time and discover the bursty words. Second, we introduce “Spike and Slab” prior to decouple the sparsity and smoothness of a distribution. The bursty words are leveraged to achieve automatic discovery of the bursty topics. Finally, to evaluate the effectiveness of our proposed algorithm, we collect Sina weibo dataset to conduct various experiments. Both qualitative and quantitative evaluations demonstrate that the proposed STM algorithm outperforms favorably against several state-of-the-art methods.

[1]  Joemon M. Jose,et al.  Real-Time Entity-Based Event Detection for Twitter , 2015, CLEF.

[2]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[3]  Vikas Sindhwani,et al.  Emerging topic detection using dictionary learning , 2011, CIKM '11.

[4]  Xiuzhen Zhang,et al.  A probabilistic method for emerging topic tracking in Microblog stream , 2016, World Wide Web.

[5]  Scott Sanner,et al.  Improving LDA topic models for microblogs via tweet pooling and automatic labeling , 2013, SIGIR.

[6]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[7]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[8]  Sinno Jialin Pan,et al.  Short and Sparse Text Topic Modeling via Self-Aggregation , 2015, IJCAI.

[9]  Yazhou Wang,et al.  Category-Level Transfer Learning from Knowledge Base to Microblog Stream for Accurate Event Detection , 2017, DASFAA.

[10]  Yalou Huang,et al.  Using Hashtag Graph-Based Topic Model to Connect Semantically-Related Words Without Co-Occurrence in Microblogs , 2016, IEEE Transactions on Knowledge and Data Engineering.

[11]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[12]  Xiaohui Yan,et al.  A Probabilistic Model for Bursty Topic Discovery in Microblogs , 2015, AAAI.

[13]  Cécile Favre,et al.  Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach , 2015, Social Network Analysis and Mining.

[14]  Miles Osborne,et al.  Using paraphrases for improving first story detection in news and Twitter , 2012, HLT-NAACL.

[15]  Timothy Baldwin,et al.  On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online , 2012, COLING.

[16]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[17]  Ee-Peng Lim,et al.  Finding Bursty Topics from Microblogs , 2012, ACL.

[18]  Paola Velardi,et al.  Efficient temporal mining of micro-blog texts and its application to event discovery , 2015, Data Mining and Knowledge Discovery.

[19]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Lei Chen,et al.  Event detection over twitter social media streams , 2013, The VLDB Journal.

[21]  Ebrahim Bagheri,et al.  Event Identification in Social Networks , 2016, Encycl. Semantic Comput. Robotic Intell..

[22]  Yunming Ye,et al.  Detecting hot topics from Twitter: A multiview approach , 2014, J. Inf. Sci..

[23]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[24]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[25]  S. R. Balasundaram,et al.  Social event detection - a systematic approach using ontology and linked open data with significance to semantic links , 2018, Int. Arab J. Inf. Technol..

[26]  Hui Xiong,et al.  Topic Modeling of Short Texts: A Pseudo-Document View , 2016, KDD.

[27]  Kamalakar Karlapalem,et al.  ET: events from tweets , 2013, WWW.

[28]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[29]  Jianyong Wang,et al.  A dirichlet multinomial mixture model-based approach for short text clustering , 2014, KDD.

[30]  Xiaoming Zhang,et al.  Event detection and popularity prediction in microblogging , 2015, Neurocomputing.

[31]  Yee Whye Teh,et al.  Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes , 2004, NIPS.

[32]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  Pascal Frossard,et al.  Multiscale event detection in social media , 2014, Data Mining and Knowledge Discovery.

[35]  Hong Cheng,et al.  Understanding Sparse Topical Structure of Short Text via Stochastic Variational-Gibbs Inference , 2016, CIKM.

[36]  Nian-Shing Chen,et al.  A novel contextual topic model for multi-document summarization , 2015, Expert Syst. Appl..

[37]  Hong Cheng,et al.  The dual-sparse topic model: mining focused topics and focused terms in short text , 2014, WWW.

[38]  Jiafeng Guo,et al.  BTM: Topic Modeling over Short Texts , 2014, IEEE Transactions on Knowledge and Data Engineering.

[39]  Chong Wang,et al.  Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process , 2009, NIPS.