New fusional framework combining sparse selection and clustering for key frame extraction

Key frame extraction can facilitate rapid browsing and efficient video indexing in many applications. However, to be effective, key frames must preserve sufficient video content while also being compact and representative. This study proposes a syncretic key frame extraction framework that combines sparse selection (SS) and mutual information-based agglomerative hierarchical clustering (MIAHC) to generate effective video summaries. In the proposed framework, the SS algorithm is first applied to the original video sequences to obtain optimal key frames. Then, using content-loss minimisation and representativeness ranking, several candidate key frames are efficiently selected and grouped as initial clusters. A post-processor – an improved MIAHC – subsequently performs further processing to eliminate redundant images and generate the final key frames. The proposed framework overcomes issues such as information redundancy and computational complexity that afflict conventional SS methods by first obtaining candidate key frames instead of accurate key frames. Subsequently, application of the improved MIAHC to these candidate key frames rather than the original video not only results in the generation of accurate key frames, but also reduces the computation time for clustering large videos. The results of comparative experiments conducted on two benchmark datasets verify that the performance of the proposed SS–MIAHC framework is superior to that of conventional methods.

[1]  Danny Crookes,et al.  Advances in Video Summarization and Skimming , 2009 .

[2]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Mohammad Rahmati,et al.  Content based video retrieval using information theory , 2013, 2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP).

[4]  Yong Yu,et al.  Video summarization via transferrable structured learning , 2011, WWW.

[5]  Junaid Baber,et al.  Shot boundary detection from videos using entropy and local descriptor , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[6]  Sankar K. Pal,et al.  Motion Frame Analysis and Scene Abstraction: Discrimination Ability of Fuzziness Measures , 1995, J. Intell. Fuzzy Syst..

[7]  Ioannis Pitas,et al.  A mutual information based face clustering algorithm for movie content analysis , 2011, Image Vis. Comput..

[8]  Zhi-Hua Zhou,et al.  Multi-View Video Summarization , 2010, IEEE Transactions on Multimedia.

[9]  Shaohui Mei,et al.  Video summarization via minimum sparse reconstruction , 2015, Pattern Recognit..

[10]  Yelena Yesha,et al.  Keyframe-based video summarization using Delaunay clustering , 2006, International Journal on Digital Libraries.

[11]  Aggelos K. Katsaggelos,et al.  MINMAX optimal video summarization , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Y. L. Liu,et al.  A Robust Image Hashing Algorithm Resistant Against Geometrical Attacks , 2013 .

[13]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[14]  Marco Pellegrini,et al.  STIMO: STIll and MOving video storyboard for the web scenario , 2009, Multimedia Tools and Applications.

[15]  Ioannis Pitas,et al.  Information theory-based shot cut/fade detection and video summarization , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Haiying Liu,et al.  Key-frame selection in WCE video based on shot detection , 2012, Proceedings of the 10th World Congress on Intelligent Control and Automation.

[17]  Alois Knoll,et al.  Mutual Information-Based 3D Object Tracking , 2008, International Journal of Computer Vision.

[18]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[19]  Ioannis Pitas,et al.  Shot detection in video sequences using entropy based metrics , 2002, Proceedings. International Conference on Image Processing.

[20]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[21]  Qiang Zhang,et al.  An Efficient Method of Key-Frame Extraction Based on a Cluster Algorithm , 2013, Journal of human kinetics.

[22]  Sung Wook Baik,et al.  Feature aggregation based visual attention model for video summarization , 2014, Comput. Electr. Eng..

[23]  Mrityunjay Kumar,et al.  Key frame extraction from consumer videos using sparse representation , 2011, 2011 18th IEEE International Conference on Image Processing.

[24]  Mateu Sbert,et al.  Tsallis entropy-based information measures for shot boundary detection and keyframe selection , 2013, Signal Image Video Process..

[25]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[26]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[27]  Meng Wang,et al.  Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification , 2012, IEEE Transactions on Multimedia.

[28]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[29]  Christopher Fry Programming on an already full brain , 1997, CACM.

[30]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[31]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[32]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[33]  Li Zhao,et al.  Key-frame extraction and shot retrieval using nearest feature line (NFL) , 2000, MULTIMEDIA '00.

[34]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Wolfgang Effelsberg,et al.  Video abstracting , 1997, CACM.