Infinite Topic Modelling for Trend Tracking - Hierarchical Dirichlet Process Approaches with Wikipedia Semantic based Method

The current affairs people concern closely vary in different periods and the evolution of trends corresponds to the reports of medias. This paper considers tracking trends by incorporating non-parametric Bayesian approaches with temporal information and presents two topic modelling methods. One utilizes an infinite temporal topic model which obtains the topic distribution over time by placing a time prior when discovering topics dynamically. In order to better organize the event trend, we present another progressive superposed topic model which simulates the whole evolutionary processes of topics, including new topics’ generation, stable topics’ evolution and old topics’ vanishment, via a series of superposed topics distribution generated by hierarchical Dirichlet process. Both of the two approaches aim at solving the real-world task while avoiding Markov assumption and breaking the number limitation of topics. Meanwhile, we employ Wikipedia based semantic background knowledge to improve the discovered topics and their readability. The experiments are carried out on the corpus of BBC news about American Forum. The results demonstrate better organized topics, evolutionary processes of topics over time and model effectiveness.

[1]  Brian D. Davison,et al.  Tracking trends: incorporating term volume into temporal topic models , 2011, KDD.

[2]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[3]  Ramnath Balasubramanyan,et al.  Modeling corpora of timestamped documents using semisupervised nonparametric topic models , 2009 .

[4]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[5]  Timothy Baldwin,et al.  Automatic Labelling of Topic Models , 2011, ACL.

[6]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[7]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[8]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[9]  Jian Hu,et al.  Mining multilingual topics from wikipedia , 2009, WWW '09.

[10]  Eric P. Xing,et al.  Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream , 2010, UAI.

[11]  Erik B. Sudderth Graphical models for visual object recognition and tracking , 2006 .

[12]  Jianwen Zhang,et al.  Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora , 2010, KDD.

[13]  David B. Dunson,et al.  The dynamic hierarchical Dirichlet process , 2008, ICML '08.

[14]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[15]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[16]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[17]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[18]  Gregor Heinrich “ Infinite LDA ” – Implementing the HDP with minimum code complexity , 2011 .

[19]  Rajeev Rastogi,et al.  Entity disambiguation with hierarchical topic models , 2011, KDD.