Spotting by Association in News Video

This paper introduces the Spotting by Association method for video analysis, which is a novel method to detect video segments with typical semantics. Video data contains various kinds of information by means of continuous images, natural language, and sound. For use in a Digital Library, it is essential to segment the video data into meaningful pieces. To detect meaningful segments, we should associate data from each modality, including video, language, and sound. For this purpose, we propose a new method for segment spotting by making correspondences between image clues detected by image analysis and language clues created by natural language analysis. As a result, relevant video segments with sufficient information in every modMity are obtained. We applied our method to closed-captioned CNN Headline News. Video segments with important situations, that is a speech, meeting, or visit, are detected fairly well.