SceneSkim: Searching and Browsing Movies Using Synchronized Captions, Scripts and Plot Summaries

Searching for scenes in movies is a time-consuming but crucial task for film studies scholars, film professionals, and new media artists. In pilot interviews we have found that such users search for a wide variety of clips---e.g., actions, props, dialogue phrases, character performances, locations---and they return to particular scenes they have seen in the past. Today, these users find relevant clips by watching the entire movie, scrubbing the video timeline, or navigating via DVD chapter menus. Increasingly, users can also index films through transcripts---however, dialogue often lacks visual context, character names, and high level event descriptions. We introduce SceneSkim, a tool for searching and browsing movies using synchronized captions, scripts and plot summaries. Our interface integrates information from such sources to allow expressive search at several levels of granularity: Captions provide access to accurate dialogue, scripts describe shot-by-shot actions and settings, and plot summaries contain high-level event descriptions. We propose new algorithms for finding word-level caption to script alignments, parsing text scripts, and aligning plot summaries to scripts. Film studies graduate students evaluating SceneSkim expressed enthusiasm about the usability of the proposed system for their research and teaching.

[1]  D. Brode,et al.  Sex, politics, and religion in Star wars : an anthology , 2012 .

[2]  Björn Hartmann,et al.  Video digests: a browsable, skimmable format for informational lecture videos , 2014, UIST.

[3]  Nadir Weibel,et al.  ChronoViz: a system for supporting navigation of time-coded data , 2011, CHI Extended Abstracts.

[4]  Tovi Grossman,et al.  Video lens: rapid playback and exploration of large video collections and associated metadata , 2014, UIST.

[5]  Alan F. Smeaton,et al.  Designing an interface for a digital movie browsing system in the film studies domain , 2011 .

[6]  Pei-Yu Chi,et al.  MixT: automatic generation of step-by-step mixed media tutorials , 2012, CHI Extended Abstracts.

[7]  Cordelia Schmid,et al.  Finding Actors and Actions in Movies , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Tovi Grossman,et al.  Swifter: improved online video scrubbing , 2013, CHI.

[9]  Daniel Jackson,et al.  Panopticon: a parallel video overview system , 2013, UIST.

[10]  Krzysztof Z. Gajos,et al.  Crowdsourcing step-by-step information extraction to enhance existing how-to videos , 2014, CHI.

[11]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[12]  Tovi Grossman,et al.  Chronicle: capture, exploration, and playback of document workflow histories , 2010, UIST.

[13]  Kevin S. Decker,et al.  Star wars and philosophy : more powerful than you can possibly imagine , 2005 .

[14]  Mark Liberman,et al.  Speaker identification on the SCOTUS corpus , 2008 .

[15]  Brendan T. O'Connor,et al.  Learning Latent Personas of Film Characters , 2013, ACL.

[16]  Adam Finkelstein,et al.  Video tapestries with continuous temporal zoom , 2010, ACM Trans. Graph..

[17]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[18]  Rainer Stiefelhagen,et al.  Aligning plot synopses to videos for story-based retrieval , 2015, International Journal of Multimedia Information Retrieval.

[19]  Wilmot Li,et al.  Tools for placing cuts and transitions in interview video , 2012, ACM Trans. Graph..

[20]  Shengdong Zhao,et al.  NoteVideo: facilitating navigation of blackboard-style lecture videos , 2013, CHI.

[21]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Dan R. Olsen,et al.  Time warp sports for internet television , 2010, TCHI.

[23]  Tovi Grossman,et al.  Ambient help , 2011, CHI.

[24]  Ben Taskar,et al.  Movie/Script: Alignment and Parsing of Video and Text Transcription , 2008, ECCV.

[25]  Rainer Stiefelhagen,et al.  Book2Movie: Aligning video scenes with book chapters , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[27]  Mira Dontcheva,et al.  Pause-and-play: automatically linking screencast video tutorials with applications , 2011, UIST.

[28]  Luke Keioskie My Name is Will , 2008 .

[29]  John R. Kender,et al.  VAST MM: multimedia browser for presentation video , 2007, CIVR '07.

[30]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[31]  Krzysztof Z. Gajos,et al.  Data-driven interaction techniques for improving navigation of educational videos , 2014, UIST.

[32]  Rémi Ronfard Reading movies: an integrated DVD player for browsing movies and their scripts , 2004, MULTIMEDIA '04.

[33]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Fei-Fei Li,et al.  Linking People in Videos with "Their" Names Using Coreference Resolution , 2014, ECCV.

[35]  Erica Klarreich,et al.  Hello, my name is… , 2014, CACM.

[36]  Carl Silvio,et al.  Culture, Identities and Technology in the Star Wars Films: Essays on the Two Trilogies , 2007 .

[37]  Wilmot Li,et al.  Content-based tools for editing audio stories , 2013, UIST.

[38]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[39]  David Salesin,et al.  Schematic storyboarding for video visualization and editing , 2006, SIGGRAPH '06.

[40]  Tovi Grossman,et al.  Swift: reducing the effects of latency in online video scrubbing , 2012, CHI.

[41]  Rémi Ronfard,et al.  A framework for aligning and indexing movies with their script , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[42]  Wendy E. Mackay,et al.  DIVA: exploratory data analysis with multimedia streams , 1998, CHI.

[43]  Laura A. Dabbish,et al.  Simplifying video editing using metadata , 2002, DIS '02.