Video Summarization Via Multiview Representative Selection

Video contents are inherently heterogeneous. To exploit different feature modalities in a diverse video collection for video summarization, we propose to formulate the task as a multiview representative selection problem. The goal is to select visual elements that are representative of a video consistently across different views (i.e., feature modalities). We present in this paper the multiview sparse dictionary selection with centroid co-regularization method, which optimizes the representative selection in each view, and enforces that the view-specific selections to be similar by regularizing them towards a consensus selection. We also introduce a diversity regularizer to favor a selection of diverse representatives. The problem can be efficiently solved by an alternating minimizing optimization with the fast iterative shrinkage thresholding algorithm. Experiments on synthetic data and benchmark video datasets validate the effectiveness of the proposed approach for video summarization, in comparison with other video summarization methods and representative selection methods such as K-medoids, sparse dictionary selection, and multiview clustering.

[1]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[2]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[3]  Yale Song,et al.  Video co-summarization: Video summarization by visual co-occurrence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Yung-Yu Chuang,et al.  Affinity aggregation for spectral clustering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[7]  Junsong Yuan,et al.  Multi-feature Spectral Clustering with Minimax Optimization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Amit K. Roy-Chowdhury,et al.  Context-Aware Surveillance Video Summarization , 2016, IEEE Transactions on Image Processing.

[9]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[10]  Yael Pritch,et al.  Webcam Synopsis: Peeking Around the World , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jie Lin,et al.  Co-regularized deep representations for video summarization , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[13]  Cordelia Schmid,et al.  Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jiebo Luo,et al.  Adaptive Greedy Dictionary Selection for Web Media Summarization , 2017, IEEE Transactions on Image Processing.

[15]  Chinh T. Dang,et al.  RPCA-KFE: Key Frame Extraction for Video Using Robust Principal Component Analysis , 2014, IEEE Transactions on Image Processing.

[16]  Xuelong Li,et al.  A General Framework for Edited Video and Raw Video Summarization , 2017, IEEE Transactions on Image Processing.

[17]  Chinh T. Dang,et al.  Heterogeneity Image Patch Index and Its Application to Consumer Video Summarization , 2014, IEEE Transactions on Image Processing.

[18]  Gang Hua,et al.  A Hierarchical Visual Model for Video Object Summarization , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Cewu Lu,et al.  Personal object discovery in first-person videos , 2015, IEEE Transactions on Image Processing.

[20]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Christophe De Vleeschouwer,et al.  Formulating Team-Sport Video Summarization as a Resource Allocation Problem , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[23]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Junsong Yuan,et al.  Sparse reconstruction cost for abnormal event detection , 2011, CVPR 2011.

[25]  Feiping Nie,et al.  Heterogeneous image feature integration via multi-modal spectral clustering , 2011, CVPR 2011.

[26]  Xiaochun Cao,et al.  Diversity-induced Multi-view Subspace Clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Amit K. Roy-Chowdhury,et al.  Collaborative Summarization of Topic-Related Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wen Gao,et al.  Trajectory based event tactics analysis in broadcast sports video , 2007, ACM Multimedia.

[29]  Tao Mei,et al.  Near-lossless semantic video summarization and its applications to video analysis , 2013, TOMCCAP.

[30]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  S. Shankar Sastry,et al.  Dissimilarity-Based Sparse Subset Selection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[34]  Jiayu Zhou,et al.  Modeling disease progression via fused sparse group lasso , 2012, KDD.

[35]  Guillermo Sapiro,et al.  Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery , 2012, NIPS.

[36]  Jean Ponce,et al.  Unsupervised Object Discovery and Tracking in Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Brendan J. Frey,et al.  Mixture Modeling by Affinity Propagation , 2005, NIPS.

[38]  Junsong Yuan,et al.  From Keyframes to Key Objects: Video Summarization by Representative Object Proposal Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[40]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[41]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[42]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[43]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[44]  Aggelos K. Katsaggelos,et al.  Discovering Thematic Objects in Image Collections and Videos , 2012, IEEE Transactions on Image Processing.

[45]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[46]  Yunhui Liu,et al.  Diversified Key-Frame Selection Using Structured ${L_{2,1}}$ Optimization , 2014, IEEE Transactions on Industrial Informatics.

[47]  Junsong Yuan,et al.  Video Summarization via Multi-view Representative Selection , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[48]  S. Shankar Sastry,et al.  Dissimilarity-Based Sparse Subset Selection , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Nam Ik Cho,et al.  A static video summarization method based on the sparse coding of features and representativeness of frames , 2017, EURASIP J. Image Video Process..

[50]  Junsong Yuan,et al.  Representative Selection with Structured Sparsity , 2017, Pattern Recognit..

[51]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[52]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[53]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[56]  Xuelong Li,et al.  Multi-view Subspace Clustering , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[57]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[58]  Eric P. Xing,et al.  Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[60]  Ali Farhadi,et al.  Salient Montages from Unconstrained Videos , 2014, ECCV.

[61]  Mubarak Shah,et al.  Query-Focused Extractive Video Summarization , 2016, ECCV.

[62]  Ke Zhang,et al.  Summary Transfer: Exemplar-Based Subset Selection for Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Fadi Dornaika,et al.  Decremental Sparse Modeling Representative Selection for prototype selection , 2015, Pattern Recognit..

[64]  Amit K. Roy-Chowdhury,et al.  Diversity-Aware Multi-Video Summarization , 2017, IEEE Transactions on Image Processing.

[65]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Luc Van Gool,et al.  Parametric Stereo for Multi-pose Face Recognition and 3D-Face Modeling , 2005, AMFG.

[67]  Shuicheng Yan,et al.  Convex Sparse Spectral Clustering: Single-View to Multi-View , 2015, IEEE Transactions on Image Processing.

[68]  Jianping Fan,et al.  Image collection summarization via dictionary learning for sparse representation , 2013, Pattern Recognit..