Key frame extraction from consumer videos using epitome

Key frame extraction algorithms select a subset of the most informative frames from videos. Key frame extraction finds applications in several broad areas of video processing research such as video summarization, video indexing, and prints from video. In this paper, an image epitome [1][2] based method to extract key frames from unstructured consumer videos is presented. In the proposed approach, we exploit image epitome to measure dissimilarity between frames of the input video. The dissimilarity scores are further analyzed using a min-max approach to extract the desired number of key frames from the input video. The proposed approach does not require shot(s) detection, segmentation, or semantic understanding. A comparison of the results obtained by this method with the ground truth agreed by multiple judges clearly indicates the feasibility of the proposed approach.

[1]  Shiri Gordon,et al.  An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Mubarak Shah,et al.  Detection and representation of scenes in videos , 2005, IEEE Transactions on Multimedia.

[3]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[4]  Shingo Uchihashi,et al.  Summarizing video using a shot importance measure and a frame-packing algorithm , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Volkan Cevher,et al.  A game theoretic approach to expander-based compressive sensing , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[6]  Nevenka Dimitrova,et al.  Video keyframe extraction and filtering: a keyframe is not a keyframe to everyone , 1997, CIKM '97.

[7]  Brendan J. Frey,et al.  Video Epitomes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[9]  Jiebo Luo,et al.  Towards Extracting Semantically Meaningful Key Frames From Personal Video Clips: From Humans to Computers , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Nikolas P. Galatsanos,et al.  Video rushes summarization using spectral clustering and sequence alignment , 2008, TVS '08.

[11]  Jiebo Luo,et al.  First- and third-party ground truth for key frame extraction from consumer video clips , 2007, Electronic Imaging.

[12]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[13]  Brendan J. Frey,et al.  Epitomic analysis of appearance and shape , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Antonio Criminisi,et al.  Epitomic location recognition , 2008, CVPR.

[15]  S. Kullback,et al.  Information Theory and Statistics , 1959 .