Diverse Sequential Subset Selection for Supervised Video Summarization

Video summarization is a challenging problem with great application potential. Whereas prior approaches, largely unsupervised in nature, focus on sampling useful frames and assembling them as summaries, we consider video summarization as a supervised subset selection problem. Our idea is to teach the system to learn from human-created summaries how to select informative and diverse subsets, so as to best meet evaluation metrics derived from human-perceived quality. To this end, we propose the sequential determinantal point process (seqDPP), a probabilistic model for diverse sequential subset selection. Our novel seqDPP heeds the inherent sequential structures in video data, thus overcoming the deficiency of the standard DPP, which treats video frames as randomly permutable items. Meanwhile, seqDPP retains the power of modeling diverse subsets, essential for summarization. Our extensive results of summarizing videos from 3 datasets demonstrate the superior performance of our method, compared to not only existing unsupervised methods but also naive applications of the standard DPP model.

[1]  O. Macchi The coincidence approach to stochastic point processes , 1975, Advances in Applied Probability.

[2]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[3]  Gary Marchionini,et al.  Open video: A framework for a test collection , 2000, J. Netw. Comput. Appl..

[4]  HongJiang Zhang,et al.  A user attention model for video summarization , 2002, MULTIMEDIA '02.

[5]  John R. Kender,et al.  Optimization Algorithms for the Selection of Key Frame Sequences of Variable Length , 2002, ECCV.

[6]  Chong-Wah Ngo,et al.  Automatic video summarization by graph modeling , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[9]  Yelena Yesha,et al.  Keyframe-based video summarization using Delaunay clustering , 2006, International Journal on Digital Libraries.

[10]  Yasuyuki Matsushita,et al.  Space-Time Video Montage , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Dan B. Goldman,et al.  Schematic storyboarding for video visualization and editing , 2006, ACM Trans. Graph..

[12]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Yael Pritch,et al.  Webcam Synopsis: Peeking Around the World , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Marco Pellegrini,et al.  STIMO: STIll and MOving video storyboard for the web scenario , 2009, Multimedia Tools and Applications.

[15]  Jiebo Luo,et al.  Towards Extracting Semantically Meaningful Key Frames From Personal Video Clips: From Humans to Computers , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Hung-Khoon Tan,et al.  Event driven summarization for web videos , 2009, WSM '09.

[17]  Bernard Mérialdo,et al.  Automatic evaluation method for rushes summary content , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[18]  Adel M. Alimi,et al.  IM(S)2: Interactive movie summarization system , 2010, J. Vis. Commun. Image Represent..

[19]  Ben Taskar,et al.  Structured Determinantal Point Processes , 2010, NIPS.

[20]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[21]  Bernard Mérialdo,et al.  VERT: automatic evaluation of video summaries , 2010, ACM Multimedia.

[22]  Esa Rahtu,et al.  Segmenting Salient Objects from Images and Videos , 2010, ECCV.

[23]  Gang Hua,et al.  A Hierarchical Visual Model for Video Object Summarization , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[25]  Ben Taskar,et al.  Learning Determinantal Point Processes , 2011, UAI.

[26]  Ben Taskar,et al.  k-DPPs: Fixed-Size Determinantal Point Processes , 2011, ICML.

[27]  Ben Taskar,et al.  Near-Optimal MAP Inference for Determinantal Point Processes , 2012, NIPS.

[28]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[29]  Alex Kulesza,et al.  Markov Determinantal Point Processes , 2012, UAI.

[30]  Stan Z. Li,et al.  Online content-aware video condensation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  José María Martínez Sanchez,et al.  Automatic evaluation of video summaries , 2012, TOMCCAP.

[33]  Ben Taskar,et al.  Discovering Diverse and Salient Threads in Document Collections , 2012, EMNLP.

[34]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.