Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features

In this paper, a novel approach to video temporal decomposition into semantic units, termed scenes, is presented. In contrast to previous temporal segmentation approaches that employ mostly low-level visual or audiovisual features, we introduce a technique that jointly exploits low-level and high-level features automatically extracted from the visual and the auditory channel. This technique is built upon the well-known method of the scene transition graph (STG), first by introducing a new STG approximation that features reduced computational cost, and then by extending the unimodal STG-based temporal segmentation technique to a method for multimodal scene segmentation. The latter exploits, among others, the results of a large number of TRECVID-type trained visual concept detectors and audio event detectors, and is based on a probabilistic merging process that combines multiple individual STGs while at the same time diminishing the need for selecting and fine-tuning several STG construction parameters. The proposed approach is evaluated on three test datasets, comprising TRECVID documentary films, movies, and news-related videos, respectively. The experimental results demonstrate the improved performance of the proposed approach in comparison to other unimodal and multimodal techniques of the relevant literature and highlight the contribution of high-level audiovisual features toward improved video segmentation to scenes.

[1]  Rainer Lienhart,et al.  Scene Determination Based on Video and Audio Features , 2004, Multimedia Tools and Applications.

[2]  Frédéric Precioso,et al.  Robust scene cut detection by supervised learning , 2006, 2006 14th European Signal Processing Conference.

[3]  Ajay Divakaran,et al.  Discriminative genre-independent audio-visual scene change detection , 2009, Electronic Imaging.

[4]  Mubarak Shah,et al.  Detection and representation of scenes in videos , 2005, IEEE Transactions on Multimedia.

[5]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Seungmin Rho,et al.  Video scene determination using audiovisual data analysis , 2004, 24th International Conference on Distributed Computing Systems Workshops, 2004. Proceedings..

[7]  Isabel Trancoso,et al.  Detecting audio events for semantic video search , 2009, INTERSPEECH.

[8]  Feng Niu,et al.  An SVM Framework for Genre-Independent Scene Change Detection , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[9]  Chong-Wah Ngo,et al.  Video summarization and scene detection by graph modeling , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Yiannis Kompatsiaris,et al.  Multi-modal scene segmentation using scene transition graphs , 2009, ACM Multimedia.

[11]  Paul Over,et al.  Video shot boundary detection: Seven years of TRECVid activity , 2010, Comput. Vis. Image Underst..

[12]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[13]  Wallapak Tavanapong,et al.  Shot clustering techniques for story browsing , 2004, IEEE Transactions on Multimedia.

[14]  Yu Cao,et al.  Audio-Assisted Scene Segmentation for Story Browsing , 2003, CIVR.

[15]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[16]  Shih-Fu Chang,et al.  Video scene segmentation using video and audio features , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[17]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[18]  Miki Haseyama,et al.  Audio signal segmentation and classification for scene-cut detection , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[19]  Ramesh C. Jain,et al.  EventWeb: Developing a Human-Centered Computing System , 2008, Computer.

[20]  Marcel Worring,et al.  Systematic evaluation of logical story unit segmentation , 2002, IEEE Trans. Multim..

[21]  Mubarak Shah,et al.  Video scene segmentation using Markov chain Monte Carlo , 2006, IEEE Transactions on Multimedia.

[22]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[23]  Chengcui Zhang,et al.  Scene change detection by audio and video clues , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[24]  Angelo Chianese,et al.  Foveated shot detection for video segmentation , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Yiannis Kompatsiaris,et al.  On the Use of Visual Soft Semantics for Video Temporal Decomposition to Scenes , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[26]  Nikolas P. Galatsanos,et al.  Scene Detection in Videos Using Shot Clustering and Sequence Alignment , 2009, IEEE Transactions on Multimedia.

[27]  João Paulo da Silva Neto,et al.  A Prototype System for Selective Dissemination of Broadcast News in European Portuguese , 2007, EURASIP J. Adv. Signal Process..

[28]  Arnold W. M. Smeulders,et al.  Real-time bag of words, approximately , 2009, CIVR '09.

[29]  Yingying Zhu,et al.  Scene change detection based on audio and video content analysis , 2003, Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003.

[30]  Yiannis Kompatsiaris,et al.  On the use of audio events for improving video scene segmentation , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[31]  Angelo Chianese,et al.  Scene detection using visual and audio attention , 2008, AMDIT '08.

[32]  Jun Yang,et al.  (Un)Reliability of video concept detection , 2008, CIVR '08.

[33]  Peng Wang,et al.  Scene Segmentation and Categorization Using NCuts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  João Paulo da Silva Neto,et al.  Audio contributions to semantic video search , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[35]  Yiannis Kompatsiaris,et al.  Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework , 2008, 2008 15th IEEE International Conference on Image Processing.

[36]  Liming Chen,et al.  Multimodal Data Fusion for Video Scene Segmentation , 2005, VISUAL.

[37]  Chong-Wah Ngo,et al.  Detection of Documentary Scene Changes by Audio-Visual Fusion , 2003, CIVR.

[38]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[39]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[40]  Clark N. Taylor,et al.  IEEE Transactions on Circuits and Systems for Video Technology information for authors , 2018, IEEE Transactions on Circuits and Systems for Video Technology.