Extracting and tracking hot topics of micro-blogs based on improved Latent Dirichlet Allocation

Abstract Micro-blog has changed people’s life, study, and work styles. Every day, we want to know what public opinion news happens and how it evolves. Extracting and tracking these topics correctly help us better understand the latest public opinions and pay attention to their evolution. To extract topics from Microblog posts accurately, we adopt five unique features of micro-blogs to drive the joint probability distributions of all words and topics, and improve LDA into our topic extraction model(named MF-LDA). To track evolution trend of the topic, we propose a hot topic life cycle model (named HTLCM). We divide the HTLCM into five stages, namely, birth, growth, maturity, decline, and disappearance. The HTLCM determines whether a topic is the candidate hot topic or not and estimates hot topic evolution stages. On the other hand, we propose a hot topic tracking (shorten for HTT) algorithm which integrates MF-LDA and HTLCM. First, the HTT assigns candidate hot topics, which are labeled by HTLCM, to the corresponding time window according to the release time. Second, to obtain the hot topic in each time window, we input Micro-blog posts of each time window into MF-LDA in order. By analyzing changes in these hot topics, we track the changes in their contents. The experiment results show that MF-LDA has a lower perplexity and higher coverage rate than LDA under the same conditions. We conclude parameters of the Transition regions of our proposed HTLCM model. The MR, FR of our proposed HTLCM model are lower than 18%. The average P, R, F of the HTT algorithm are 85.64%, 84.97%, 85.66%, respectively. A practical application on topicFemale driver beats male driver in chengdu shows an excellent effect and practical significance of HTLCM model and HTT algorithm in extracting and tracking hot topics.

[1]  Hua Zhao,et al.  Chinese Microblog Topic Detection Based on the Latent Semantic Analysis and Structural Property , 2013, J. Networks.

[2]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.

[3]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4]  Ana M. García-Serrano,et al.  A step forward for Topic Detection in Twitter: An FCA-based approach , 2016, Expert Syst. Appl..

[5]  Hui Xiong,et al.  Detecting and Tracking Topics and Events from Web Search Logs , 2012, TOIS.

[6]  Jui-Feng Yeh,et al.  Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation , 2016, Neurocomputing.

[7]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[8]  Fuji Ren,et al.  Exploring latent semantic information for textual emotion recognition in blog articles , 2018, IEEE/CAA Journal of Automatica Sinica.

[9]  Yan Jia,et al.  Predicting the topic influence trends in social media with multiple models , 2014, Neurocomputing.

[10]  Lirong Qiu,et al.  ULW-DMM: An Effective Topic Modeling Method for Microblog Short Text , 2019, IEEE Access.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Ming Jiang,et al.  Grey System Theory based prediction for topic trend on Internet , 2014, Eng. Appl. Artif. Intell..

[13]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[14]  Xiaoliang Chen,et al.  Information propagation model based on hybrid social factors of opportunity, trust and motivation , 2019, Neurocomputing.

[15]  Qing Li,et al.  Social community evolution by combining gravitational relationship with community structure , 2018, Intell. Data Anal..

[16]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[17]  Li Shi,et al.  Prospecting Information Extraction by Text Mining Based on Convolutional Neural Networks–A Case Study of the Lala Copper Deposit, China , 2018, IEEE Access.

[18]  Chao Xu,et al.  A Short-Text Oriented Clustering Method for Hot Topics Extraction , 2015, Int. J. Softw. Eng. Knowl. Eng..

[19]  Li Guo,et al.  Mining Hot Topics from Twitter Streams , 2012, ICCS.

[20]  Lei Chen,et al.  Event detection over twitter social media streams , 2013, The VLDB Journal.

[21]  Anísio Lacerda,et al.  A general framework to expand short text for topic modeling , 2017, Inf. Sci..

[22]  Yunming Ye,et al.  A Topic Detection Approach Through Hierarchical Clustering on Concept Graph , 2013 .

[23]  Ying Zhu,et al.  Detecting Hotspot Information Using Multi-Attribute Based Topic Model , 2015, PloS one.

[24]  Yasuo Ariki,et al.  Topic tracking language model for speech recognition , 2011, Comput. Speech Lang..