Diversity-Aware Multi-Video Summarization

Most video summarization approaches have focused on extracting a summary from a single video; we propose an unsupervised framework for summarizing a collection of videos. We observe that each video in the collection may contain some information that other videos do not have, and thus exploring the underlying complementarity could be beneficial in creating a diverse informative summary. We develop a novel diversity-aware sparse optimization method for multi-video summarization by exploring the complementarity within the videos. Our approach extracts a multi-video summary, which is both interesting and representative in describing the whole video collection. To efficiently solve our optimization problem, we develop an alternating minimization algorithm that minimizes the overall objective function with respect to one video at a time while fixing the other videos. Moreover, we introduce a new benchmark data set, Tour20, that contains 140 videos with multiple manually created summaries, which were acquired in a controlled experiment. Finally, by extensive experiments on the new Tour20 data set and several other multi-view data sets, we show that the proposed approach clearly outperforms the state-of-the-art methods on the two problems—topic-oriented video summarization and multi-view video summarization in a camera network.

[1]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[2]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[5]  Yale Song,et al.  Video co-summarization: Video summarization by visual co-occurrence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  B. S. Manjunath,et al.  Multicamera video summarization and anomaly detection from activity motifs , 2014, TOSN.

[8]  Yael Pritch,et al.  Webcam Synopsis: Peeking Around the World , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Bernard Mérialdo,et al.  Multi-video summarization based on AV-MMR , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[10]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[11]  Tao Mei,et al.  Near-lossless semantic video summarization and its applications to video analysis , 2013, TOMCCAP.

[12]  Lorenzo Torresani,et al.  C3D: Generic Features for Video Analysis , 2014, ArXiv.

[13]  Bernard Mérialdo,et al.  Multi-document video summarization , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[14]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[15]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[16]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yelena Yesha,et al.  Keyframe-based video summarization using Delaunay clustering , 2006, International Journal on Digital Libraries.

[18]  Eric P. Xing,et al.  Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Luming Zhang,et al.  An Effective Video Summarization Framework Toward Handheld Devices , 2015, IEEE Transactions on Industrial Electronics.

[20]  Bohyung Han,et al.  Personalized video summarization with human in the loop , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[21]  Ananda S. Chowdhury,et al.  Scalable Video Summarization Using Skeleton Graph and Random Walk , 2014, 2014 22nd International Conference on Pattern Recognition.

[22]  Amit K. Roy-Chowdhury,et al.  Collaborative Summarization of Topic-Related Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Zhi-Hua Zhou,et al.  Multi-View Video Summarization , 2010, IEEE Transactions on Multimedia.

[24]  Shayok Chakraborty,et al.  Towards Distributed Video Summarization , 2015, ACM Multimedia.

[25]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  S. Shankar Sastry,et al.  Dissimilarity-Based Sparse Subset Selection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[28]  Yanwen Guo,et al.  Multi-keyframe abstraction from videos , 2011, 2011 18th IEEE International Conference on Image Processing.

[29]  Bernard Mérialdo,et al.  Generating summaries of multi-episode video , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[30]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Wei-Ying Ma,et al.  IGroup: web image search results clustering , 2006, MM '06.

[34]  Tao Mei,et al.  Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Bernard Mérialdo,et al.  Multi-video summarization based on Video-MMR , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[36]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[37]  Amit K. Roy-Chowdhury,et al.  Embedded sparse coding for summarizing multi-view videos , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[38]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[39]  José María Martínez Sanchez,et al.  A Framework for Scalable Summarization of Video , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[40]  Yi Yang,et al.  You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[42]  H. C. Longuet-Higgins,et al.  An algorithm for associating the features of two images , 1991, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[43]  Junsong Yuan,et al.  From Keyframes to Key Objects: Video Summarization by Representative Object Proposal Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[46]  David Eichelberger,et al.  Handbook Of Psychological Testing , 2016 .

[47]  James R. Glass,et al.  Deep multimodal semantic embeddings for speech and images , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[48]  Ying Zhang,et al.  Multi-video summary and skim generation of sensor-rich videos in geo-space , 2012, MMSys '12.

[49]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[50]  Renato Cordeiro de Amorim,et al.  Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering , 2012, Pattern Recognit..

[51]  Sung Wook Baik,et al.  Efficient visual attention based framework for extracting key frames from videos , 2013, Signal Process. Image Commun..

[52]  Amit K. Roy-Chowdhury,et al.  Video summarization in a multi-view camera network , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[53]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[54]  R. Glowinski,et al.  Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics , 1987 .

[55]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[56]  Hanspeter Pfister,et al.  Multi-video browsing and summarization , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[57]  Vikas Singh,et al.  Solving the multi-way matching problem by permutation synchronization , 2013, NIPS.

[58]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[59]  B. S. Manjunath,et al.  Multicamera Video Summarization from Optimal Reconstruction , 2010, ACCV Workshops.

[60]  Stan Z. Li,et al.  Online content-aware video condensation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[62]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[63]  Shaohui Mei,et al.  A Top-Down Approach for Video Summarization , 2014, TOMM.

[64]  Hong Chen,et al.  Multi-video summarization using complex graph clustering and mining , 2010, Comput. Sci. Inf. Syst..

[65]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[66]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[67]  Gunhee Kim,et al.  Storyline Representation of Egocentric Videos with an Applications to Story-Based Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[68]  Michael R. Lyu,et al.  Video summarization by video structure analysis and graph optimization , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[69]  Yuxin Peng,et al.  Clip-based similarity measure for query-dependent clip retrieval and video summarization , 2006, IEEE Trans. Circuits Syst. Video Technol..

[70]  Ananda S. Chowdhury,et al.  Multi-View Video Summarization Using Bipartite Matching Constrained Optimum-Path Forest Clustering , 2015, IEEE Transactions on Multimedia.

[71]  Ahmed M. Elgammal,et al.  One-shot multi-set non-rigid feature-spatial matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[72]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[73]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[74]  Chia-han Lee,et al.  On-Line Multi-View Video Summarization for Wireless Video Sensor Network , 2015, IEEE Journal of Selected Topics in Signal Processing.

[75]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[76]  Ali Farhadi,et al.  Ranking Domain-Specific Highlights by Analyzing Edited Videos , 2014, ECCV.

[77]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[78]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[79]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[80]  Ke Zhang,et al.  Summary Transfer: Exemplar-Based Subset Selection for Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Fadi Dornaika,et al.  Decremental Sparse Modeling Representative Selection for prototype selection , 2015, Pattern Recognit..

[82]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.