Feature-based video key frame extraction for low quality video sequences

We present an approach to key frame extraction for structuring user generated videos on video sharing websites (e. g. YouTube). Our approach is intended to link existing image search engines to video data. User generated videos are, contrary to professional material, unstructured, do not follow any fixed rule, and their camera work is poor. Furthermore, the coding quality is bad due to low resolution and high compression. In a first step, we segment video sequences into shots by detecting gradual and abrupt cuts. Further, longer shots are segmented into subshots based on location and camera motion features. One representative key frame is extracted per subshot using visual attention features, such as lighting, camera motion, face, and text appearance. These key frames are useful for indexing and for searching similar video sequences using MPEG-7 descriptors [1].

[1]  Jong-Un Won,et al.  Adaptive video-dissolve detection method based on correlation between successive scenes , 2003 .

[2]  N. Nikolaidis,et al.  Video shot detection and condensed representation. a review , 2006, IEEE Signal Processing Magazine.

[3]  Kebin Jia,et al.  Video Key Frame Extraction Based on Spatial-Temporal Color Distribution , 2008, 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[4]  José Ignacio Benavides Benítez,et al.  Reliable real time scene change detection in MPEG compressed video , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[5]  Jae-Gark Choi,et al.  Correlation based video-dissolve detection , 2003, International Conference on Information Technology: Research and Education, 2003. Proceedings. ITRE2003..

[6]  Yu-Jin Zhang,et al.  Video segmentation and key frame extraction with parametric model , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[7]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[8]  Zhi-Cheng Zhao,et al.  Extraction of Semantic Keyframes Based on Visual Attention and Affective Models , 2007, 2007 International Conference on Computational Intelligence and Security (CIS 2007).

[9]  Ioannis Pitas,et al.  Information theory-based shot cut/fade detection and video summarization , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Da-Wen Xu A Blind Video Watermarking Algorithm Based on 3D Wavelet Transform , 2007 .

[11]  Xueming Qian,et al.  Effective Fades and Flashlight Detection Based on Accumulating Histogram Difference , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Rainer Lienhart,et al.  Comparison of automatic shot boundary detection algorithms , 1998, Electronic Imaging.