Video summarization via block sparse dictionary selection

Abstract The explosive growth of video data has raised new challenges for many video processing tasks such as video browsing and retrieval, hence, effective and efficient video summarization (VS) is urgently demanded to automatically summarize a video into a succinct version. Recent years have witnessed the advancements of sparse representation based approaches for VS. However, video frames are analyzed individually for keyframe selection in existing methods, which could lead to redundancy among selected keyframes and poor robustness to outlier frames. Due to that adjacent frames are visually similar, candidate keyframes often occur in temporal blocks, in addition to sparse presence. Therefore, in this paper, the block-sparsity of candidate keyframes is taken into consideration, by which the VS problem is formulated as a block sparse dictionary selection model. Moreover, a simultaneous block version of Orthogonal Matching Pursuit (SBOMP) algorithm is designed for model optimization. Two keyframe selection strategies are also explored for each block. Experimental results on two benchmark datasets, namely VSumm and TVSum datasets, demonstrate that the proposed SBOMP based VS method clearly outperforms several state-of-the-art sparse representation based methods in terms of F-score, redundancy among keyframes and robustness to outlier frames.

[1]  Xuelong Li,et al.  Video Summarization With Attention-Based Encoder–Decoder Networks , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[3]  Shaohui Mei,et al.  Nonlinear kernel sparse dictionary selection for video summarization , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[4]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[6]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[7]  Yelena Yesha,et al.  Keyframe-based video summarization using Delaunay clustering , 2006, International Journal on Digital Libraries.

[8]  Vasileios Mezaris,et al.  Fast shot segmentation combining global and local visual descriptors , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Ehsan Elhamifar,et al.  Online Summarization via Submodular and Convex Optimization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Shaohui Mei,et al.  Video summarization via minimum sparse reconstruction , 2015, Pattern Recognit..

[11]  Mrityunjay Kumar,et al.  Key frame extraction from consumer videos using sparse representation , 2011, 2011 18th IEEE International Conference on Image Processing.

[12]  Gary Marchionini,et al.  The Open Video Digital Library , 2002, D Lib Mag..

[13]  Hassan Farsi,et al.  Scalable video summarization via sparse dictionary learning and selection simultaneously , 2017, Multimedia tools and applications.

[14]  Ioannis Pitas,et al.  Information theory-based shot cut/fade detection and video summarization , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  David Zhang,et al.  A Survey of Sparse Representation: Algorithms and Applications , 2015, IEEE Access.

[16]  Marco Pellegrini,et al.  STIMO: STIll and MOving video storyboard for the web scenario , 2009, Multimedia Tools and Applications.

[17]  Shaohui Mei,et al.  Robust video summarization using collaborative representation of adjacent frames , 2018, Multimedia Tools and Applications.

[18]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[19]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[20]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[21]  Wolfgang Effelsberg,et al.  Robust clustering-based video-summarization with integration of domain-knowledge , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[22]  Alex Kulesza,et al.  Markov Determinantal Point Processes , 2012, UAI.

[23]  Tao Mei,et al.  A Bag-of-Importance Model With Locality-Constrained Coding Based Feature Learning for Video Summarization , 2014, IEEE Transactions on Multimedia.

[24]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[25]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[27]  Shaohui Mei,et al.  L2,0 constrained sparse dictionary selection for video summarization , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[28]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[29]  Ke Zhang,et al.  Retrospective Encoders for Video Summarization , 2018, ECCV.

[30]  Yonina C. Eldar,et al.  Block-Sparse Signals: Uncertainty Relations and Efficient Recovery , 2009, IEEE Transactions on Signal Processing.

[31]  Bin Zhao,et al.  HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Chinh T. Dang,et al.  Heterogeneity Image Patch Index and Its Application to Consumer Video Summarization , 2014, IEEE Transactions on Image Processing.

[34]  Anastasios Tefas,et al.  A salient dictionary learning framework for activity video summarization via key-frame extraction , 2018, Inf. Sci..

[35]  Jiebo Luo,et al.  Adaptive Greedy Dictionary Selection for Web Media Summarization , 2017, IEEE Transactions on Image Processing.

[36]  Suchada Sitjongsataporn,et al.  Multi-Modal Visual Features-Based Video Shot Boundary Detection , 2017, IEEE Access.

[37]  Fadi Dornaika,et al.  Instance Selection Using Nonlinear Sparse Modeling , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Aboutajdine Driss,et al.  Shot boundary detection via adaptive low rank and svd-updating , 2017 .

[40]  Shaohui Mei,et al.  Video Summarization via Simultaneous Block Sparse Representation , 2017, 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[41]  Tao Mei,et al.  Detecting shot boundary with sparse coding for video summarization , 2017, Neurocomputing.

[42]  H. Isil Bozma,et al.  Video Summarization via Segments Summary Graphs , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[43]  Shiyang Lu,et al.  Keypoint-Based Keyframe Selection , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[44]  Christos Faloutsos,et al.  Developing high-level representations of video clips using VideoTrails , 1997, Electronic Imaging.

[45]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[46]  Yonina C. Eldar,et al.  Blind Multiband Signal Reconstruction: Compressed Sensing for Analog Signals , 2007, IEEE Transactions on Signal Processing.

[47]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[48]  N. Nikolaidis,et al.  Video shot detection and condensed representation. a review , 2006, IEEE Signal Processing Magazine.

[49]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[50]  S. Domnic,et al.  Shot based keyframe extraction for ecological video indexing and retrieval , 2014, Ecol. Informatics.

[51]  Yonina C. Eldar,et al.  Robust Recovery of Signals From a Structured Union of Subspaces , 2008, IEEE Transactions on Information Theory.

[52]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[53]  Ehsan Elhamifar,et al.  Subset Selection and Summarization in Sequential Data , 2017, NIPS.

[54]  Jianmin Jiang,et al.  Video summarization via spatio-temporal deep architecture , 2019, Neurocomputing.

[55]  Wei Zheng,et al.  Shot Boundary Detection and Keyframe Extraction Based on Scale Invariant Feature Transform , 2009, 2009 Eighth IEEE/ACIS International Conference on Computer and Information Science.

[56]  René Vidal,et al.  Block-Sparse Recovery via Convex Optimization , 2011, IEEE Transactions on Signal Processing.

[57]  Fumin Shen,et al.  Spatial and temporal scoring for egocentric video summarization , 2016, Neurocomputing.