Creating Summaries from User Videos

This paper proposes a novel approach and a new benchmark for video summarization. Thereby we focus on user videos, which are raw videos containing a set of interesting events. Our method starts by segmenting the video by using a novel “superframe” segmentation, tailored to raw videos. Then, we estimate visual interestingness per superframe using a set of low-, mid- and high-level features. Based on this scoring, we select an optimal subset of superframes to create an informative and interesting summary. The introduced benchmark comes with multiple human created summaries, which were acquired in a controlled psychological experiment. This data paves the way to evaluate summarization methods objectively and to get new insights in video summarization. When evaluating our method, we find that it generates high-quality results, comparable to manual, human-created summaries.

[1]  Joseph V. Maschelli,et al.  The Five C's of Cinematography , 1965 .

[2]  Thomas S. Huang,et al.  Image processing , 1971 .

[3]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[4]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Takeo Kanade,et al.  Video skimming and characterization through the combination of image and language understanding , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[6]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[7]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  Michael T. Goodrich,et al.  Algorithm Design: Foundations, Analysis, and Internet Examples , 2001 .

[10]  Pavol Návrat Review of "Algorithm design: foundations, analysis and internet examples" by Michael T. Goodrich and Roberto Tamassia. John Wiley & Sons, Inc. 2001. , 2004, SIGA.

[11]  David Salesin,et al.  Schematic storyboarding for video visualization and editing , 2006, SIGGRAPH 2006.

[12]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[14]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[15]  Paul Over,et al.  The trecvid 2008 BBC rushes summarization evaluation , 2008, TVS '08.

[16]  David C. Gibbon,et al.  Brief and high-interest video summary generation: evaluating the AT&T labs rushes summarizations , 2008, TVS '08.

[17]  Yuzhen Niu,et al.  Using Web Photos for Measuring Video Frame Interestingness , 2009, IJCAI.

[18]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Luc Van Gool,et al.  I know what you did last summer: object-level auto-annotation of holiday snaps , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Gang Hua,et al.  A Hierarchical Visual Model for Video Object Summarization , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[22]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[24]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Sung Wook Baik,et al.  Efficient visual attention based framework for extracting key frames from videos , 2013, Signal Process. Image Commun..

[26]  Luc Van Gool,et al.  Robust Realtime Motion-Split-And-Merge for Motion Segmentation , 2013, GCPR.

[27]  W. Chu Studying Aesthetics in Photographic Images Using a Computational Approach , 2013 .

[28]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Luc Van Gool,et al.  The Interestingness of Images , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Luc Van Gool,et al.  SEEDS: Superpixels Extracted Via Energy-Driven Sampling , 2012, International Journal of Computer Vision.

[32]  Eric P. Xing,et al.  Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  David Eichelberger,et al.  Handbook Of Psychological Testing , 2016 .