论文信息 - Wikipedia Based News Video Topic Modeling for Information Extraction

Wikipedia Based News Video Topic Modeling for Information Extraction

Determining the topic of a news video story (NVS) from its audio-visual footage is an important part of meta-data generation. In this paper we propose a news story topic modeling approach that takes advantage of online knowledge resources like Wikipedia to model the topic of a news story. A NVS is modeled as a distribution over several Wikipedia pages related to the story. The mapping of the NVS to a Wikipedia page table-of-contents (TOC) is also determined. The specific advantages of this topic modeling approach are. (1) The topic is interpretable as a weighted distribution over a set of semantically meaningful story title phrases instead of just being a collection of words. (2) It facilitates organizing news video stories as a taxonomy that captures several perspectives to the story. (3) The taxonomy facilitates exploration and non-linear search. Performance evaluations from an information extraction perspective validate the efficacy of the proposed topic modeling approach compared to TF-IDF and LDA based approaches on a large news video corpus.

Kong-Wah Wan | Sujoy Roy | Mun-Thye Mak

[1] James Allan,et al. Evaluating topic models for information retrieval , 2008, CIKM '08.

[2] Stephan Raaijmakers,et al. A Cocktail Approach to the VideoCLEF'09 Linking Task , 2009, CLEF.

[3] Carol Peters,et al. Evaluating Systems for Multilingual and Multimodal Information Access, 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers , 2009, CLEF.

[4] John D. Lafferty,et al. Correlated Topic Models , 2005, NIPS.

[5] Petra Perner,et al. Advances in Data Mining , 2002, Lecture Notes in Computer Science.

[6] W. Bruce Croft,et al. LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[7] Liang-Tien Chia,et al. Faceted topic retrieval of news video using joint topic modeling of visual features and speech transcripts , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[8] Maximilian Eibl,et al. VideoCLEF 2008: ASR Classification with Wikipedia Categories , 2008, CLEF.

[9] Daniel Barbará,et al. On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10] Otis Gospodnetic,et al. Lucene in Action , 2004 .

[11] Ben Carterette,et al. Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[12] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13] Chong Wang,et al. Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.