Regularized Svd-Based Video Frame Saliency for Unsupervised Activity Video Summarization

Storage, browsing and analysis of human activity videos can be significantly facilitated by automated video summarization. Unsupervised key-frame extraction remains the most widely applicable technique for summarizing activity videos. However, their specific properties make the problem difficult to solve. Typical relevant algorithms fall under the video frame clustering or the dictionary-of-representatives families, with salient dictionary learning having been recently proposed. Under this formulation, the video frames selected as key-frames are the ones which simultaneously best reconstruct the entire video and are salient compared to the rest. This paper improves upon such a method by replacing the video frame saliency estimation term with one based on Regularized SVD-based Low Rank Approximation, taking advantage of the well-established correlation between midrange matrix singular values and salient regions. Extensive empirical evaluation showcases the high performance of both the salient dictionary learning framework and the specific proposed method.

[1]  Kin-Man Lam,et al.  Saliency detection based on singular value decomposition , 2015, J. Vis. Commun. Image Represent..

[2]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[3]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Anastasios Tefas,et al.  Movie shot selection preserving narrative properties , 2016, 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP).

[5]  Ioannis Pitas,et al.  Stereoscopic video description for key-frame extraction in movie summarization , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[6]  Lijuan Duan,et al.  A spatiotemporal weighted dissimilarity-based method for video saliency detection , 2015, Signal Process. Image Commun..

[7]  G. Jenks The Data Model Concept in Statistical Mapping , 1967 .

[8]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[9]  Anastasios Tefas,et al.  Summarization of human activity videos via low-rank approximation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Anastasios Tefas,et al.  Summarization of human activity videos using a salient dictionary , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[11]  Anastasios Tefas,et al.  Compact Video Description and Representation for Automated Summarization of Human Activities , 2016, INNS Conference on Big Data.

[12]  Anastasios Tefas,et al.  Multi-view semantic temporal video segmentation , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[13]  Ioannis Pitas,et al.  The i3DPost Multi-View and 3D Human Action/Interaction Database , 2009, 2009 Conference for Visual Media Production.

[14]  Guillermo Cámara Chávez,et al.  A New Method for Static Video Summarization Using Local Descriptors and Video Temporal Segmentation , 2013, 2013 XXVI Conference on Graphics, Patterns and Images.

[15]  Li Sun,et al.  Event-based large scale surveillance video summarization , 2016, Neurocomputing.

[16]  Per Christian Hansen,et al.  Low-rank revealing QR factorizations , 1994, Numer. Linear Algebra Appl..

[17]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  Shaohui Mei,et al.  Video Summarization with Global and Local Features , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[19]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[20]  Chinh T. Dang,et al.  RPCA-KFE: Key Frame Extraction for Video Using Robust Principal Component Analysis , 2014, IEEE Transactions on Image Processing.

[21]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[22]  Anastasios Tefas,et al.  Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics , 2016, IEEE Transactions on Image Processing.

[23]  Shaohui Mei,et al.  Video summarization via minimum sparse reconstruction , 2015, Pattern Recognit..