Vector ordering based multimodal video skimming for user videos

Video skimming is generation of a shorter video as a summary for any given video, containing a subset of its segments that are sufficient to convey its purpose. User videos, which are often almost structureless, do not have any predefined script or events to help in summarization. Use of multiple modalities with a proper fusion strategy would be beneficial for skimming of such videos. In this paper, first, r(educed)-ordering based importance ranking of video segments is performed on audio and visual channels independently. A round robin based fusion scheme is proposed for combining importance ranks generated considering multiple modalities, and applied on the importance ranks from audio and visual channels. The fused rank is then used to generate the video summary. Experimental results show that the proposed fusion scheme outperforms relevant low level fusion and single modality cases, when r-ordering-based and other schemes are used for importance determination in each modality.

[1]  Debashis Sen,et al.  Vector R-ordering based selection of segments for video skimming , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[2]  Petros Maragos,et al.  Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention , 2013, IEEE Transactions on Multimedia.

[3]  K. Plataniotis,et al.  Color Image Processing and Applications , 2000 .

[4]  M. Emre Celebi,et al.  Alternative distance/similarity measures for reduced ordering based nonlinear vector filters , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[6]  Petros Maragos,et al.  Predicting audio-visual salient events based on visual, audio and text modalities for movie summarization , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[7]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[8]  W. Chu Studying Aesthetics in Photographic Images Using a Computational Approach , 2013 .

[9]  Guizhong Liu,et al.  A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[11]  Petros Maragos,et al.  Audio salient event detection and summarization using audio and text modalities , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).