Automatic Creation of a Speech Archive by Monologue Scene Detection in News Videos

According to accumulation of large amount of videos by the spread of large capacity HDDs, automatic detection and presentation of the scene that a user desires are requested. In this report, we propose a method of detecting monologue scenes in news videos, such as speeches or interviews, and of creating a news speech archive. In the monologue scene detection, existing techniques are effectively combined by media integration using image, audio, and text information. By this process, we propose a method of automatic monologue scene detection using only input videos as the source. To create a news speech archive, monologue scenes named by person names that appear in the closed caption text are used. As a result of experiments, we obtained recall of 37% and precision of 52% as a percentage of correct answers for persons with the top three large speech clusters.

[1]  Shin'ichi Satoh,et al.  Mining Large-Scale Broadcast Video Archives Towards Inter-video Structuring , 2004, PCM.

[2]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[3]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[4]  Shin'ichi Satoh,et al.  Topic-based inter-video structuring of a large-scale news video corpus , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[5]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..