Video summarization and scene detection by graph modeling

We propose a unified approach for video summarization based on the analysis of video structures and video highlights. Two major components in our approach are scene modeling and highlight detection. Scene modeling is achieved by normalized cut algorithm and temporal graph analysis, while highlight detection is accomplished by motion attention modeling. In our proposed approach, a video is represented as a complete undirected graph and the normalized cut algorithm is carried out to globally and optimally partition the graph into video clusters. The resulting clusters form a directed temporal graph and a shortest path algorithm is proposed to efficiently detect video scenes. The attention values are then computed and attached to the scenes, clusters, shots, and subshots in a temporal graph. As a result, the temporal graph can inherently describe the evolution and perceptual importance of a video. In our application, video summaries that emphasize both content balance and perceptual quality can be generated directly from a temporal graph that embeds both the structure and attention information.

[1]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Boon-Lock Yeo,et al.  Video visualization for compact presentation and fast browsing of pictorial content , 1997, IEEE Trans. Circuits Syst. Video Technol..

[3]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Nuno Vasconcelos,et al.  A spatiotemporal motion model for video summarization , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[5]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[6]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[7]  Rainer Lienhart Dynamic video summarization of home video , 1999, Electronic Imaging.

[8]  Sang Uk Lee,et al.  Efficient video indexing scheme for content-based retrieval , 1999, IEEE Trans. Circuits Syst. Video Technol..

[9]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[10]  Jeho Nam,et al.  Dynamic video summarization and visualization , 1999, MULTIMEDIA '99.

[11]  Shih-Fu Chang,et al.  Determining computable scenes in films and their structures using audio-visual memory models , 2000, ACM Multimedia.

[12]  Xin Liu,et al.  Video summarization using singular value decomposition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[14]  HongJiang Zhang,et al.  A new perceived motion based shot content representation , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[15]  Chong-Wah Ngo,et al.  Video partitioning by temporal slice coherency , 2001, IEEE Trans. Circuits Syst. Video Technol..

[16]  Xavier Binefa,et al.  An EM algorithm for video summarization, generative model approach , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17]  HongJiang Zhang,et al.  A user attention model for video summarization , 2002, MULTIMEDIA '02.

[18]  HongJiang Zhang,et al.  A model of motion attention for video skimming , 2002, Proceedings. International Conference on Image Processing.

[19]  Shih-Fu Chang,et al.  A utility framework for the automatic generation of audio-visual skims , 2002, MULTIMEDIA '02.

[20]  Chong-Wah Ngo,et al.  Motion analysis and segmentation through spatio-temporal slices processing , 2003, IEEE Trans. Image Process..

[21]  Chong-Wah Ngo,et al.  Motion-Based Video Representation for Scene Change Detection , 2004, International Journal of Computer Vision.