Detecting Hotspot Information Using Multi-Attribute Based Topic Model

Microblogging as a kind of social network has become more and more important in our daily lives. Enormous amounts of information are produced and shared on a daily basis. Detecting hot topics in the mountains of information can help people get to the essential information more quickly. However, due to short and sparse features, a large number of meaningless tweets and other characteristics of microblogs, traditional topic detection methods are often ineffective in detecting hot topics. In this paper, we propose a new topic model named multi-attribute latent dirichlet allocation (MA-LDA), in which the time and hashtag attributes of microblogs are incorporated into LDA model. By introducing time attribute, MA-LDA model can decide whether a word should appear in hot topics or not. Meanwhile, compared with the traditional LDA model, applying hashtag attribute in MA-LDA model gives the core words an artificially high ranking in results meaning the expressiveness of outcomes can be improved. Empirical evaluations on real data sets demonstrate that our method is able to detect hot topics more accurately and efficiently compared with several baselines. Our method provides strong evidence of the importance of the temporal factor in extracting hot topics.

[1]  S. Xie,et al.  Superconductivity and magnetic properties in Pr0.2Yb0.8−xLaxBa2Cu3O7−δ , 1992 .

[2]  J. Allan,et al.  On-Line New Event Detection using Single Pass Clustering , 1998 .

[3]  Mitsuru Ishizuka,et al.  Topic extraction from news archive using TF*PDF algorithm , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  J. Rosenthal,et al.  Markov Chain Monte Carlo , 2018 .

[6]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[7]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8]  Yuan Zhu-zhi SIRS epidemic model with direct immunization on complex networks , 2008 .

[9]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[10]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[12]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[13]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[14]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[15]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[16]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[17]  Vikas Sindhwani,et al.  Emerging topic detection using dictionary learning , 2011, CIKM '11.

[18]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[19]  Qi Gao,et al.  Semantic Enrichment of Twitter Posts for User Profile Construction on the Social Web , 2011, ESWC.

[20]  Jeff Z. Pan,et al.  The Semanic Web: Research and Applications - 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29 - June 2, 2011, Proceedings, Part II , 2011, ESWC.

[21]  Jianling Sun,et al.  Large scale microblog mining using distributed MB-LDA , 2012, WWW.

[22]  Roman Słowiński,et al.  Rough Sets and Current Trends in Computing , 2012, Lecture Notes in Computer Science.

[23]  Chengyi Xia,et al.  An SIR model with infection delay and propagation vector in complex networks , 2012 .

[24]  Yu Tian,et al.  Hotspots Detection on Microblog , 2012, 2012 Fourth International Conference on Multimedia Information Networking and Security.

[25]  Bo Huang,et al.  Microblog Topic Detection Based on LDA Model and Single-Pass Clustering , 2012, RSCTC.

[26]  Hua Zhao,et al.  Chinese Microblog Topic Detection Based on the Latent Semantic Analysis and Structural Property , 2013, J. Networks.

[27]  Yamir Moreno,et al.  Effects of delayed recovery and nonuniform transmission on the spreading of diseases in complex networks , 2012, Physica A: Statistical Mechanics and its Applications.

[28]  Bo Hu,et al.  Finding contexts of social influence in online social networks , 2013, SNAKDD '13.

[29]  Juan-Zi Li,et al.  What Users Care About: A Framework for Social Content Alignment , 2013, IJCAI.

[30]  Li Li,et al.  Learning to Classify Short Text with Topic Model and External Knowledge , 2013, KSEM.

[31]  Fang Chen,et al.  Microblog Topic Contagiousness Measurement and Emerging Outbreak Monitoring , 2014, CIKM.

[32]  Wray L. Buntine,et al.  Topic Model : Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon , 2014 .

[33]  SangKeun Lee,et al.  Context over Time: Modeling Context Evolution in Social Media , 2014, DUBMOD '14.

[34]  Peng Zhang,et al.  On Modelling Non-linear Topical Dependencies , 2014, ICML.

[35]  Marzena Kryszkiewicz,et al.  Rough Sets and Current Trends in Computing , 2014, Lecture Notes in Computer Science.

[36]  Yamir Moreno,et al.  Dynamics of interacting diseases , 2014, 1402.4523.

[37]  Yalou Huang,et al.  Hashtag Graph Based Topic Model for Tweet Mining , 2014, 2014 IEEE International Conference on Data Mining.

[38]  S. Kokubo,et al.  Universal scaling for the dilemma strength in evolutionary games. , 2015, Physics of life reviews.

[39]  Lin Wang,et al.  Evolutionary games on multilayer networks: a colloquium , 2015, The European Physical Journal B.