News hotspots detection and tracking based on LDA topic model

With the rapid spread of Internet and the mobile web, the number of news pages is increasing quickly as well as the content of news becomes highly dynamic. It's difficult for normal users to obtain specific information contained in a mass of news streams. So it's of great research significance to study how to analyze massive news, detect and track news hotspots automatically. This research proposes to apply LDA (Latent Dirichlet Allocation) model to the application of topic detection and tracking. The news articles collected by crawlers are modeled by the LDA model in a form of document-topic-word distribution. We propose a method to compute the heat of topics based on the distribution and to detect the news hotspots. In addition, we track the evolution of the topic trends in different time-slices. Jenson-Shannon distance is used to measure the similarity between topics to identify topic inheritance and topic mutation. We conducted experiments on a dataset consisting of 3462 news texts from news portals. The result revealed that the proposed model has a good effect both in detecting hotspots and discovering meaningful topical evolution trends.

[1]  Jiming Liu,et al.  Learning Topic Models by Belief Propagation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  T. N. Baranger,et al.  Numerical analysis of an energy-like minimization method to solve the Cauchy problem with noisy data , 2011, J. Comput. Appl. Math..

[3]  Tao Wang,et al.  The key technology of topic detection based on K-means , 2010, 2010 International Conference on Future Information Technology and Management Engineering.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  David Blei,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[6]  Yuan Hongyong,et al.  Hot Spot Information Auto-detection Method of Network Public Opinion , 2010 .

[7]  Dan Zhang,et al.  Topic detection based on K-means , 2011, 2011 International Conference on Electronics, Communications and Control (ICECC).

[8]  이주연,et al.  Latent Dirichlet Allocation (LDA) 모델 기반의 인공지능(A.I.) 기술 관련 연구 활동 및 동향 분석 , 2018 .

[9]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.