ElasticPlay: Interactive Video Summarization with Dynamic Time Budgets

Video consumption is being shifted from sit-and-watch to selective skimming. Existing video player interfaces, however, only provide indirect manipulation to support this emerging behavior. Video summarization alleviates this issue to some extent, shortening a video based on the desired length of a summary as an input variable. But an optimal length of a summarized video is often not available in advance. Moreover, the user cannot edit the summary once it is produced, limiting its practical applications. We argue that video summarization should be an interactive, mixed-initiative process in which users have control over the summarization procedure while algorithms help users achieve their goal via video understanding. In this paper, we introduce ElasticPlay, a mixed-initiative approach that combines an advanced video summarization technique with direct interface manipulation to help users control the video summarization process. Users can specify a time budget for the remaining content while watching a video; our system then immediately updates the playback plan using our proposed cut-and-forward algorithm, determining which parts to skip or to fast-forward. This interactive process allows users to fine-tune the summarization result with immediate feedback. We show that our system outperforms existing video summarization techniques on the TVSum50 dataset. We also report two lab studies (22 participants) and a Mechanical Turk deployment study (60 participants), and show that the participants responded favorably to ElasticPlay.

[1]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Leonard McMillan,et al.  Computational time-lapse video , 2007, SIGGRAPH '07.

[3]  Senthil Mani,et al.  AUSUM: approach for unsupervised bug report summarization , 2012, SIGSOFT FSE.

[4]  James Lull,et al.  The Adolescent Audience for Music Videos and Why They Watch. , 1986 .

[5]  Irfan A. Essa,et al.  Leveraging Contextual Cues for Generating Basketball Highlights , 2016, ACM Multimedia.

[6]  R. P. Carver Effect of increasing the rate of speech presentation upon comprehension. , 1973, Journal of educational psychology.

[7]  John C. Tang,et al.  AIR conferencing: accelerated instant replay for in-meeting multimodal review , 2010, ACM Multimedia.

[8]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[9]  Adel M. Alimi,et al.  IM(S)2: Interactive movie summarization system , 2010, J. Vis. Commun. Image Represent..

[10]  Svetha Venkatesh,et al.  Temporal semantic compression for video browsing , 2008, IUI '08.

[11]  Svetha Venkatesh,et al.  Towards a Video Browser for the Digital Native , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[12]  Krzysztof Z. Gajos,et al.  Data-driven interaction techniques for improving navigation of educational videos , 2014, UIST.

[13]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[14]  Pattie Maes,et al.  Agents that reduce work and information overload , 1994, CACM.

[15]  Henry Lieberman,et al.  Compelling intelligent user interfaces—how much AI? , 1997, IUI '97.

[16]  Ron Tamborini,et al.  Defining Media Enjoyment as the Satisfaction of Intrinsic Needs , 2010 .

[17]  T. Sticht,et al.  Review of research on the intelligibility and comprehension of accelerated speech. , 1969, Psychological bulletin.

[18]  John R. Kender,et al.  Time-constrained dynamic semantic compression for video indexing and interactive searching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yale Song,et al.  Video2GIF: Automatic Generation of Animated GIFs from Video , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Gunther Heidemann,et al.  Information-based adaptive fast-forward for visual surveillance , 2011, Multimedia Tools and Applications.

[22]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Eric Horvitz,et al.  Principles of mixed-initiative user interfaces , 1999, CHI '99.

[24]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Kazutaka Kurihara CinemaGazer: a system for watching videos at very high speed , 2012, AVI.

[26]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[27]  Krzysztof Z. Gajos,et al.  Understanding in-video dropouts and interaction peaks inonline lecture videos , 2014, L@S.

[28]  Pattie Maes,et al.  Agents that reduce work and information overload , 1994, CACM.

[29]  Hung-Khoon Tan,et al.  Beyond search: Event-driven summarization for web videos , 2011, TOMCCAP.

[30]  Tao Mei,et al.  Video Collage: A Novel Presentation of Video Sequence , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[31]  Wolfgang Hürst,et al.  Advanced user interfaces for dynamic video browsing , 2004, MULTIMEDIA '04.

[32]  Lisa J. Stifelman,et al.  A Study of Rate Discrimination of Time-Compressed Speech , 2007 .

[33]  Wolfgang Hürst,et al.  Interactive, dynamic video browsing with the zoomslider interface , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[34]  Bing-Yu Chen,et al.  SmartPlayer: user-centric video fast-forwarding , 2009, CHI.

[35]  John R. Smith,et al.  Using MPEG-7 and MPEG-21 for personalizing video , 2004, IEEE MultiMedia.

[36]  Pierre Dragicevic,et al.  Video browsing by direct manipulation , 2008, CHI.

[37]  Ajay Divakaran,et al.  Adaptive fast playback-based video skimming using a compressed-domain visual complexity measure , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[38]  Sung Wook Baik,et al.  Efficient visual attention based framework for extracting key frames from videos , 2013, Signal Process. Image Commun..

[39]  Gang Hua,et al.  A Hierarchical Visual Model for Video Object Summarization , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  David A. Shamma,et al.  Knowing funny: genre perception and categorization in social video sharing , 2011, CHI.

[41]  Ben Shneiderman,et al.  Direct manipulation vs. interface agents , 1997, INTR.

[42]  Björn Hartmann,et al.  SceneSkim: Searching and Browsing Movies Using Synchronized Captions, Scripts and Plot Summaries , 2015, UIST.

[43]  Wen-Nung Lie,et al.  Video Summarization Based on Semantic Feature Analysis and User Preference , 2008, 2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (sutc 2008).

[44]  Ajay Divakaran,et al.  An extended framework for adaptive playback-based video summarization , 2003, SPIE ITCom.

[45]  Tovi Grossman,et al.  Swifter: improved online video scrubbing , 2013, CHI.

[46]  Michael F. Cohen,et al.  Content-aware dynamic timeline for video browsing , 2010, UIST '10.

[47]  Björn Hartmann,et al.  Video digests: a browsable, skimmable format for informational lecture videos , 2014, UIST.

[48]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[49]  G. Buck Assessing Listening , 2001 .