Comprehensible Video Thumbnails

We present the Comprehensible Video Thumbnail; an automatically generated visual précis that summarizes salient objects and their dynamics within a video clip. Salient moving objects are detected within clips using a novel stochastic sampling technique that identifies, clusters and then tracks regions exhibiting affine motion coherence within the clip. Tracks are analyzed to determine salient instants at which motion and/or appearance changes significantly, and the resulting objects arranged in a stylized composition optimized to reduce visual clutter and enhance understanding of scene content through classification and depiction of motion type and trajectory. The result is an object‐level visual gist of the clip, obtained with full automation and depicting content and motion with greater descriptive power that prior approaches. We demonstrate these benefits through a user study in which the comprehension of our video thumbnails is compared to the state of the art over a wide variety of sports footage.

[1]  Walter Bender,et al.  Salient video stills: content and context preserved , 1993, MULTIMEDIA '93.

[2]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[3]  P. Anandan,et al.  Efficient representations of video sequences and their applications , 1996, Signal Process. Image Commun..

[4]  Dony,et al.  Iconic versus naturalistic motion cues in automated reverse storyboarding , 2006 .

[5]  David Salesin,et al.  Panoramic video textures , 2005, ACM Trans. Graph..

[6]  Shree K. Nayar,et al.  Scene Collages and Flexible Camera Arrays , 2007, Rendering Techniques.

[7]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[8]  Frank M. Shipman,et al.  Trailblazing: Video Playback Control by Direct Object Manipulation , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[9]  Richard Szeliski,et al.  Video textures , 2000, SIGGRAPH.

[10]  Yael Pritch,et al.  Making a Long Video Short: Dynamic Video Synopsis , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  R. Tapu,et al.  Salient object detection in video streams , 2012, 2012 10th International Symposium on Electronics and Telecommunications.

[12]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Andrew P. Witkin,et al.  Large steps in cloth simulation , 1998, SIGGRAPH.

[14]  Yasuyuki Matsushita,et al.  Interactive video exploration using pose slices , 2006, SIGGRAPH '06.

[15]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[16]  David Salesin,et al.  Schematic storyboarding for video visualization and editing , 2006, SIGGRAPH 2006.

[17]  Kwan-Liu Ma,et al.  Dynamic video narratives , 2010, SIGGRAPH 2010.

[18]  Michael Cohen,et al.  First-person Hyperlapse Videos , 2014, SIGGRAPH 2014.

[19]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[20]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  David Salesin,et al.  Schematic storyboarding for video visualization and editing , 2006, SIGGRAPH '06.

[22]  John A. Robinson,et al.  Techniques for automated reverse storyboarding , 2005 .

[23]  Wendy E. Mackay,et al.  Video mosaic: laying out time in a physical space , 1994, MULTIMEDIA '94.

[24]  Daniel Cohen-Or,et al.  Action synopsis: pose selection and illustration , 2005, ACM Trans. Graph..

[25]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[26]  Yael Pritch,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008 1 Non-Chronological Video , 2022 .

[27]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[28]  Richard Szeliski,et al.  First-person hyper-lapse videos , 2014, ACM Trans. Graph..

[29]  J. Andrew Bangham,et al.  Nonlinear Scale-Space from n-Dimensional Sieves , 1996, ECCV.

[30]  John P. Collomosse,et al.  Video Analysis for Cartoon-like Special Effects , 2003, BMVC.