Multi-video synopsis for video representation

The world is covered with millions of cameras with each recording a huge amount of video. It is a time-consuming task to watch these videos, as most of them are of little interest due to the lack of activity. Video representation is thus an important technology to tackle with this issue. However, conventional video representation methods mainly focus on a single video, aiming at reducing the spatiotemporal redundancy as much as possible. In contrast, this paper describes a novel approach to present the dynamics of multiple videos simultaneously, aiming at a less intrusive viewing experience. Given a main video and multiple supplementary videos, the proposed approach automatically constructs a synthesized multi-video synopsis by integrating the supplementary videos into the most suitable spatiotemporal portions within this main video. The problem of finding suitable integration between the main video and supplementary videos is formulated as the maximum a posterior (MAP) problem, in which the desired properties related to a less intrusive viewing experience, i.e., informativeness, consistency, visual naturalness, and stability, are maximized. This problem is solved by using an efficient Viterbi beam search algorithm. Furthermore, an informative blending algorithm that naturalizes the connecting boundary between different videos is proposed. The proposed method has a wide variety of applications such as visual information representation, surveillance video browsing, video summarization, and video advertising. The effectiveness of multi-video synopsis is demonstrated in extensive experiments over different types of videos with different synopsis cases.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Dacheng Tao,et al.  Discriminative Locality Alignment , 2008, ECCV.

[3]  Philip S. Yu,et al.  Incremental tensor analysis: Theory and applications , 2008, TKDD.

[4]  Takeo Kanade,et al.  Video skimming and characterization through the combination of image and language understanding , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[5]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[6]  Xuelong Li,et al.  Supervised Tensor Learning , 2005, ICDM.

[7]  Meng Wang,et al.  MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search , 2007, TRECVID.

[8]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Yael Pritch,et al.  Making a Long Video Short: Dynamic Video Synopsis , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Yasuyuki Matsushita,et al.  Space-Time Video Montage , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Xuelong Li,et al.  Modality Mixture Projections for Semantic Video Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[13]  P. Anandan,et al.  Mosaic based representations of video sequences and their applications , 1995, Proceedings of IEEE International Conference on Computer Vision.

[14]  Tao Mei,et al.  VideoSense: towards effective online video advertising , 2007, ACM Multimedia.

[15]  Yael Pritch,et al.  Webcam Synopsis: Peeking Around the World , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Xuelong Li,et al.  General Tensor Discriminant Analysis and Gabor Features for Gait Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Hermann Ney,et al.  Data driven search organization for continuous speech recognition , 1992, IEEE Trans. Signal Process..

[18]  Tao Mei,et al.  Video Collage: A Novel Presentation of Video Sequence , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[19]  Xuelong Li,et al.  Negative Samples Analysis in Relevance Feedback , 2007, IEEE Transactions on Knowledge and Data Engineering.