QuickCut: An Interactive Tool for Editing Narrated Video

We present QuickCut, an interactive video editing tool designed to help authors efficiently edit narrated video. QuickCut takes an audio recording of the narration voiceover and a collection of raw video footage as input. Users then review the raw footage and provide spoken annotations describing the relevant actions and objects in the scene. QuickCut time-aligns a transcript of the annotations with the raw footage and a transcript of the narration to the voiceover. These aligned transcripts enable authors to quickly match story events in the narration with semantically relevant video segments and form alignment constraints between them. Given a set of such constraints, QuickCut applies dynamic programming optimization to choose frame-level cut points between the video segments while maintaining alignments with the narration and adhering to low-level film editing guidelines. We demonstrate QuickCut's effectiveness by using it to generate a variety of short (less than 2 minutes) narrated videos. Each result required between 14 and 52 minutes of user time to edit (i.e. between 8 and 31 minutes for each minute of output video), which is far less than typical authoring times with existing video editing workflows.

[1]  Pei-Yu Chi,et al.  DemoCut: generating concise instructional videos for physical demonstrations , 2013, UIST.

[2]  Michael F. Cohen,et al.  Real-time hyperlapse creation via optimal frame selection , 2015, ACM Trans. Graph..

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Peter H. N. de With,et al.  Automatic mashup generation from multiple-camera concert recordings , 2010, ACM Multimedia.

[5]  John C. Russ The Image Processing Handbook, Fifth Edition (Image Processing Handbook) , 2006 .

[6]  Pablo César,et al.  Automatic generation of video narratives from shared UGC , 2011, HT '11.

[7]  Yaser Sheikh,et al.  Automatic editing of footage from multiple social cameras , 2014, ACM Trans. Graph..

[8]  Yaser Sheikh,et al.  Gaze-Driven Video Re-Editing , 2015, TOGS.

[9]  Abhishek Ranjan,et al.  Improving meeting capture by applying television production principles with audio and motion detection , 2008, CHI.

[10]  Romit Roy Choudhury,et al.  MoVi: mobile phone based video highlights via collaborative sensing , 2010, MobiSys '10.

[11]  Wilmot Li,et al.  Tools for placing cuts and transitions in interview video , 2012, ACM Trans. Graph..

[12]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Ben Taskar,et al.  Movie/Script: Alignment and Parsing of Video and Text Transcription , 2008, ECCV.

[14]  Wilmot Li,et al.  Content-based tools for editing audio stories , 2013, UIST.

[15]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[16]  Björn Hartmann,et al.  Video digests: a browsable, skimmable format for informational lecture videos , 2014, UIST.

[17]  Björn Hartmann,et al.  SceneSkim: Searching and Browsing Movies Using Synchronized Captions, Scripts and Plot Summaries , 2015, UIST.

[18]  Frédo Durand,et al.  Visual transcripts , 2015, ACM Trans. Graph..

[19]  Rainer Stiefelhagen,et al.  Book2Movie: Aligning video scenes with book chapters , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[22]  Michael Gleicher,et al.  Virtual videography , 2007, TOMCCAP.

[23]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[24]  Stanislav Sumec Multi Camera Automatic Video Editing , 2004, ICCVG.