On the Use of Visual Soft Semantics for Video Temporal Decomposition to Scenes

This work examines the possibility of exploiting, for the purpose of video segmentation to scenes, semantic information coming from the analysis of the visual modality. This information, in contrast to the low-level visual features typically used in previous approaches, is obtained by application of trained visual concept detectors such as those developed and evaluated as part of the TRECVID High-Level Feature Extraction Task. A large number of non-binary detectors is used for defining a high dimensional semantic space. In this space, each shot is represented by the vector of detector confidence scores, and the similarity of two shots is evaluated by defining an appropriate shot semantic similarity measure. Evaluation of the proposed approach is performed on two test datasets, using baseline concept detectors trained on a dataset completely different from the test ones. The results show that the use of such semantic information, which we term ``visual soft semantics'', contributes to improved video decomposition to scenes.

[1]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[2]  Jun Yang,et al.  (Un)Reliability of video concept detection , 2008, CIVR '08.

[3]  Yongdong Zhang,et al.  Distribution-based concept selection for concept-based video retrieval , 2009, ACM Multimedia.

[4]  Angelo Chianese,et al.  Scene detection using visual and audio attention , 2008, AMDIT '08.

[5]  Yiannis Kompatsiaris,et al.  MESH participation to TRECVID2008 HLFE , 2008, TRECVID.

[6]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[7]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[8]  Michael G. Strintzis,et al.  Statistical Motion Information Extraction and Representation for Semantic Video Analysis , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[10]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[11]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[12]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[13]  Ajay Divakaran,et al.  Discriminative genre-independent audio-visual scene change detection , 2009, Electronic Imaging.

[14]  Marcel Worring,et al.  Systematic evaluation of logical story unit segmentation , 2002, IEEE Trans. Multim..

[15]  Frédéric Precioso,et al.  Robust scene cut detection by supervised learning , 2006, 2006 14th European Signal Processing Conference.

[16]  Yiannis Kompatsiaris,et al.  Multi-modal scene segmentation using scene transition graphs , 2009, ACM Multimedia.

[17]  Ling-Yu Duan,et al.  A Multimodal Scheme for Program Segmentation and Representation in Broadcast Video Streams , 2008, IEEE Transactions on Multimedia.

[18]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[19]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[20]  Yiannis Kompatsiaris,et al.  Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework , 2008, 2008 15th IEEE International Conference on Image Processing.

[21]  Amit P. Sheth,et al.  Semantics for the Semantic Web: The Implicit, the Formal and the Powerful , 2005, Int. J. Semantic Web Inf. Syst..

[22]  Yiannis Kompatsiaris,et al.  On the use of audio events for improving video scene segmentation , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.