Adaptive Topic Modeling with Probabilistic Pseudo Feedback in Online Topic Detection

Online topic detection (OTD) system seeks to analyze sequential stories in a real-time manner so as to detect new topics or to associate stories with certain existing topics. To handle new stories more precisely, an adaptive topic modeling method that incorporates probabilistic pseudo feedback is proposed in this paper to tune every topic model with a changed environment. Differently, this method considers every incoming story as pseudo feedback with certain probability, which is the similarity between the story and the topic. Experiment results show that probabilistic pseudo feedback brings promising improvement to online topic detection.

[1]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[2]  Hsin-Hsi Chen,et al.  NLP and IR Approaches to Monolingual and Multilingual Link Detection , 2002, COLING.

[3]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[4]  Richard M. Schwartz,et al.  Topic detection in broadcast news , 1999, EUROSPEECH.

[5]  Chris Clifton,et al.  TopCat: data mining for topic identification in a text corpus , 1999, IEEE Transactions on Knowledge and Data Engineering.

[6]  Yiming Yang,et al.  Topic-conditioned novelty detection , 2002, KDD.

[7]  Bruno Pouliquen,et al.  Navigating multilingual news collections using automatically extracted information , 2005, 27th International Conference on Information Technology Interfaces, 2005..