Identification of Names and Actions of Principal Objects in TV Program Segments Using Closed Captions

This paper proposes a method for automatically extracting principal video objects that appear in TV program segments and their actions using linguistic analysis of closed captions. We focus on features based on the text style of the closed captions by using Quinlan's C4.5 decision-tree learning algorithm. We extract a noun describing a video object and a verb describing an action for each video shot. To show the effectiveness of the method, we conducted experiments on the extraction of video segments in which animals appear and perform actions in twenty episodes of a Nature program. We obtained F-values of 0.609 on the extraction of video segments in which animals appear and 0.699 on extracting the action of "eating." We applied our method to a further 20 episodes, and generated a multimedia encyclopedia of animals. This provided a total of 387 video clips of 105 kinds of animals and 261 video clips of 56 kinds of actions.

[1]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[2]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..