Unsupervised scene detection in Olympic video using multi-modal chains

This paper presents a novel unsupervised method for identifying the semantic structure in long semi-structured video streams. We identify ‘chains’, local clusters of repeated features from both the video stream and audio transcripts. Each chain serves as an indicator that the temporal interval it demarcates is part of the same semantic event. By layering all the chains over each other, dense regions emerge from the overlapping chains, from which we can identify the semantic structure of the video. We analyze two clustering strategies that accomplish this task.

[1]  Nuno Vasconcelos,et al.  Statistical models of video structure for content analysis and characterization , 2000, IEEE Trans. Image Process..

[2]  Changsheng Xu,et al.  Using Webcast Text for Semantic Event Detection in Broadcast Sports Video , 2008, IEEE Transactions on Multimedia.

[3]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[4]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[5]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[6]  Alberto Del Bimbo,et al.  Common Visual Cues for Sports Highlights Modeling , 2005, Multimedia Tools and Applications.

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  E. F. Skorochod'ko Adaptive Method of Automatic Abstracting and Indexing , 1971, IFIP Congress.

[9]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[10]  Chia-Hung Yeh,et al.  Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques , 2006, IEEE Signal Processing Magazine.

[11]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[12]  Marie-Francine Moens,et al.  News Story Segmentation in Multiple Modalities , 2009, CBMI.

[13]  Riccardo Leonardi,et al.  Extraction of Significant Video Summaries by Dendrogram Analysis , 2006, 2006 International Conference on Image Processing.

[14]  Noboru Babaguchi,et al.  Intermodal collaboration: a strategy for semantic content analysis for broadcasted sports video , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[15]  Ying Li,et al.  Content-based movie analysis and indexing based on audiovisual cues , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Changsheng Xu,et al.  Live sports event detection based on broadcast video and web-casting text , 2006, MM '06.