A static video summarization method based on the sparse coding of features and representativeness of frames

This paper presents a video summarization method that is specifically for the static summary of consumer videos. Considering that the consumer videos usually have unclear shot boundaries and many low-quality or meaningless frames, we propose a two-step approach where the first step skims a video and the second step performs content-aware clustering with keyframe selection. Specifically, the first step removes most of redundant frames that contain only little new information by employing the spectral clustering method with color histogram features. As a result, we obtain a condensed video that is shorter and has clearer temporal boundaries than the original. In the second step, we perform rough temporal segmentation and then apply refined clustering for each of the temporal segments, where each frame is represented by the sparse coding of SIFT features. The keyframe selection from each cluster is based on the measure of representativeness and visual quality of frames, where the representativeness is defined from the sparse coding and the visual quality is the combination of contrast, blur, and image skew measures. The problem of keyframe selection is to find the frames that have both representativeness and high quality, which is formulated as an optimization problem. Experiments on videos with various lengths show that the resulting summaries closely follow the important contents of videos.

[1]  Ze-Nian Li,et al.  Matching by Linear Programming and Successive Convexification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Mahmood Fathy,et al.  Hierarchical Keyframe-based Video Summarization Using QR-Decomposition and Modified -Means Clustering , 2010, EURASIP J. Adv. Signal Process..

[3]  Zhang Yi,et al.  Collaborative neighbor representation based classification using l2-minimization approach , 2013, Pattern Recognit. Lett..

[4]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[5]  Sung Wook Baik,et al.  Efficient visual attention based framework for extracting key frames from videos , 2013, Signal Process. Image Commun..

[6]  Wei Chen,et al.  Parametric model for video content analysis , 2008, Pattern Recognit. Lett..

[7]  Nathalie Guyader,et al.  Video Summarization Based on Camera Motion and a Subjective Evaluation Method , 2007, EURASIP J. Image Video Process..

[8]  Patricia Ladret,et al.  The blur effect: perception and estimation with a new no-reference perceptual blur metric , 2007, Electronic Imaging.

[9]  S. Domnic,et al.  Walsh–Hadamard Transform Kernel-Based Feature Vector for Shot Boundary Detection , 2014, IEEE Transactions on Image Processing.

[10]  Alan F. Smeaton,et al.  Indexing of Fictional Video Content for Event Detection and Summarisation , 2007, EURASIP J. Image Video Process..

[11]  Fernando Díaz-de-María,et al.  Temporal segmentation and keyframe selection methods for user-generated video search-based annotation , 2015, Expert Syst. Appl..

[12]  IEEE conference on computer vision and pattern recognition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[14]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Sung Wook Baik,et al.  Adaptive key frame extraction for video summarization using an aggregation mechanism , 2012, J. Vis. Commun. Image Represent..

[16]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[17]  Zygmunt Pizlo,et al.  Camera Motion-Based Analysis of User Generated Video , 2010, IEEE Transactions on Multimedia.

[18]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[19]  Nam Ik Cho,et al.  Skew estimation of natural images based on a salient line detector , 2013, J. Electronic Imaging.

[20]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[21]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[22]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Bhabatosh Chanda,et al.  A Model-Based Shot Boundary Detection Technique Using Frame Transition Parameters , 2012, IEEE Transactions on Multimedia.

[24]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[25]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Hongsheng Li,et al.  Object Matching Using a Locally Affine Invariant and Linear Programming Techniques , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Nikolas P. Galatsanos,et al.  Scene Detection in Videos Using Shot Clustering and Sequence Alignment , 2009, IEEE Transactions on Multimedia.

[28]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[29]  Serkan Kiranyaz,et al.  A perceptual scheme for fully automatic video shot boundary detection , 2014, Signal Process. Image Commun..

[30]  Jonathan Foote,et al.  Discriminative techniques for keyframe selection , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[31]  Marco Pellegrini,et al.  STIMO: STIll and MOving video storyboard for the web scenario , 2009, Multimedia Tools and Applications.

[32]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[34]  Mona Omidyeganeh,et al.  Video Keyframe Analysis Using a Segment-Based Statistical Metric in a Visually Sensitive Parametric Space , 2011, IEEE Trans. Image Process..