Unsupervised event segmentation of news content with multimodal cues

In the age of content snacking and mobisodes (mobile episodes) the paradigm of media consumption is radically changing. Media consumption is moving from monolithic, prepackaged, well-edited, and elaborate content presentation to a continuous feed of brief segments as singleton episodes and few-minutes videos, that are often supported by or initiated via tweets and status updates. In these updates, attention spans are small and the content packaging is less relevant with respect to the dynamic, 'streaming' aspect of information. This trend has a profound influence on the segmentation requirements that are needed to make this stream of possible information. In this paper, we present a novel method to automatically extract structured content, events, where events include major cast interviews, dialogs, background segments, etc. from news video in an unsupervised fashion. Two key ideas differentiate this unsupervised method from the others: the type of information that we use to find events and the method utilized to combine this information for coherent multimedia events. The proposed system exploits audio, visual appearance, detected faces, and mid-level semantic concepts from every video shot, but instead of combining everything together, the framework clusters them independently and by applying coherence rules assembles the multimedia events. Additionally, we discuss the effect of segmentation errors in practical retrieval and content consumption tasks.

[1]  Mario Vento,et al.  An Unsupervised Algorithm for Anchor Shot Detection , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[2]  Shih-Fu Chang,et al.  Discovery and fusion of salient multimodal features toward news story segmentation , 2003, IS&T/SPIE Electronic Imaging.

[3]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[4]  Zhu Liu,et al.  Major Cast Detection in Video Using Both Speaker and Face Information , 2007, IEEE Transactions on Multimedia.

[5]  Michael J. Witbrock,et al.  Story segmentation and detection of commercials in broadcast news video , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[6]  Sheng Tang,et al.  A Novel Anchorperson Detection Algorithm Based on Spatio-temporal Slice , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[7]  Shih-Fu Chang,et al.  CU-VIREO 374 : Fusing Columbia 374 and VIREO 374 for Large Scale Semantic Concept Detection , 2008 .

[8]  Shih-Fu Chang,et al.  Unsupervised discovery of multilevel statistical video structures using hierarchical hidden Markov models , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[9]  David C. Gibbon,et al.  A Fast, Comprehensive Shot Boundary Determination System , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[10]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[11]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[12]  Chin-Hui Lee,et al.  Unsupervised anchor shot detection using multi-modal spectral clustering , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  William Cyrus Navidi,et al.  Statistics for Engineers and Scientists , 2004 .

[14]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[15]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16]  Xinbo Gao,et al.  Unsupervised video-shot segmentation and model-free anchorperson detection for news video story parsing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[17]  Alexander G. Hauptmann,et al.  LSCOM Lexicon Definitions and Annotations (Version 1.0) , 2006 .

[18]  Hao Wu,et al.  Anchor Shot Detection with Diverse Style Backgrounds Based on Spatial-Temporal Slice Analysis , 2010, MMM.

[19]  Yiannis S. Boutalis,et al.  Selection of the proper Compact Composite Descriptor for improving content based image retrieval , 2009 .

[20]  Winston H. Hsu,et al.  Anchor Shot Detection in TRECVID-2005 Broadcast News Videos , 2006 .