论文信息 - Assembling personal speech collections by monologue scene detection from a news video archive

Assembling personal speech collections by monologue scene detection from a news video archive

Monologue scenes in news shows are important since they contain non-verbal information that could not be expressed through text media. In this paper, we propose a method that detects monologue scenes by individuals in news shows (news subjects) without external or prior knowledge on the show. The method first detects monologue scene candidates by face detection in the frame images, and then excludes scenes overlapped with speech by anchor-persons or reporters (news persons) by dynamically modeling them according to clues obtained from the closed-caption text and from the audio stream. As an application of monologue scene detection, we also propose a method which assembles personal speech collections per individual that appear in the news. Although the methods still need further improvement for realistic use, we confirmed the effectiveness of employing multimodal information for the tasks, and also saw interesting outputs from the automatically assembled speech collections.

Hiroshi Murase | Ichiro Ide | Tomokazu Takahashi | Naoki Sekioka

[1] Takeo Kanade,et al. Semantic analysis for video contents extraction—spotting by association in news video , 1997, MULTIMEDIA '97.

[2] Shin'ichi Satoh,et al. Topic Threading for Structuring a Large-Scale News Video Archive , 2004, CIVR.

[3] Shojiro Nishio,et al. Advanced Multimedia Content Processing , 1999, Lecture Notes in Computer Science.

[4] Wei-Ying Ma,et al. Image and Video Retrieval , 2003, Lecture Notes in Computer Science.

[5] Ichiro Ide,et al. Automatic Video Indexing Based on Shot Classification , 1998, AMCP.

[6] John R. Smith,et al. IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[7] Shin'ichi Satoh,et al. Mining Large-Scale Broadcast Video Archives Towards Inter-video Structuring , 2004, PCM.

[8] Lynda Hardman,et al. Using rhetorical annotations for generating video documentaries , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[9] Takeo Kanade,et al. Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[10] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11] Alan F. Smeaton,et al. Large Scale Evaluations of Multimedia Information Retrieval: The TRECVid Experience , 2005, CIVR.

[12] Shin'ichi Satoh,et al. Exploiting Topic Thread Structures in a News Video Archive for the Semi-Automatic Generation of Video Summaries , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[13] Tobun Dorbin Ng,et al. Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.