Topic-Based Structuring of a Very Large-Scale News Video Corpus

We introduce a topic-based inter-video structuring method that considers application to a very large-scale news video corpus as well as user interfaces that provide the users with the ability to efficiently browse through the corpus based on the topic structure. Although the proposed method is a multimedia-integrated method that refers to both text and image based information, this paper focuses on text-based topic segmentation and tracking / threading. First, topic segmentation is performed referring to inter-sentence keyword vector relations within a single video. Next, topic tracking and threading is performed referring to inter-topic keyword vector relations throughout the entire video corpus. Such analysis should reveal the underlying structure of the entire corpus which is not simply a large volume of unrelated data, but data full of rich information in the content-based relational structure itself. The segmentation method evaluated by applying the proposed method showed realistic ability. The proposed method was then applied to 555 daily news video (approximately 270 hours) obtained from a specific Japanese news program. Although detailed evaluation is yet to be done, the user interfaces showed good browsing ability for users to retrieve and track a topic thread of interest.