Attention-guided Algorithms to Retarget and Augment Animations, Stills, and Videos

Still pictures, animations and videos are used by artists to tell stories visually. Computer graphics algorithms create visual stories too, either automatically, or, by assisting artists. Why is it so hard to create algorithms that perform like a trained visual artist? The reason is that artists think about where a viewer will look at and how their attention will flow across the scene, but algorithms do not have a similarly sophisticated understanding of the viewer. Our key insight is that computer graphics algorithms should be designed to take into account how viewer attention is allocated. We first show that designing optimization terms based on viewers' attentional priorities allows the algorithm to handle artistic license in the input data, such as geometric inconsistencies in hand-drawn shapes. We then show that measurements of viewer attention enables algorithms to infer high-level information about a scene, for example, the object of storytelling interest in every frame of a video. All the presented algorithms retarget or augment the traditional form of a visual art. Traditional art includes artwork such as printed comics, i.e., pictures that were created before computers became mainstream. It also refers to artwork that can be created in the way it was done before computers, for example, hand-drawn animation and live action films. Connecting traditional art with computational algorithms allows us to leverage the unique strengths on either side. We demonstrate these ideas on three applications: Retargeting and augmenting animations: Two widely practiced forms of animation are two-dimensional (2D) hand-drawn animation and three-dimensional (3D) computer animation. To apply the techniques of the 3D medium to 2D animation, researchers have attempted to compute 3D reconstructions of the shape and motion of the hand-drawn character, which are meant to act as their 'proxy' in the 3D environment. We argue that a perfect reconstruction is excessive because it does not leverage the characteristics of viewer attention. We present algorithms to generate a 3D proxy with different levels of detail, such that at each level the error terms account for quantities that will attract viewer attention. These algorithms allow a hand-drawn animation to be retargeted to a 3D skeleton and be augmented with physically simulated secondary effects. Augmenting stills: Moves-on-stills is a technique to engage the viewer while presenting still pictures on television or in movies. This effect is widely used to augment comics to create 'motion comics'. Though state of the art software, like iMovie, allows a user to specify the parameters of the camera move, it does not solve the problem of how the parameters are chosen. We believe that a good camera move respects the visual route designed by the artist who crafted the still picture; if we record the gaze of viewers looking at composed still pictures, we can reconstruct the artist's intention. We show, through a perceptual study, that the artist succeeds in directing viewer attention in comic book pictures, and we present an algorithm to predict the parameters of camera moves-on-stills from statistics derived from eyetracking data. Retargeting video: Video retargeting is the process of altering the original video to fit the new display size, while best preserving content and minimizing artifacts. Recent techniques define content as color, edges, faces and other image-based saliency features. We suggest that content is, in fact, what people look at. We introduce a novel operator that extends the classic "pan-and-scan" to introduce cuts in addition to automatic pans based on viewer eyetracking data. We also present a gaze-based evaluation criterion to quantify the performance of our operator.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  Adam Finkelstein,et al.  Shadows for cel animation , 2000, SIGGRAPH.

[3]  Xing Xie,et al.  A visual attention model for adapting images on small displays , 2003, Multimedia Systems.

[4]  Samuel Kaski,et al.  Inferring object relevance from gaze in dynamic scenes , 2010, ETRA.

[5]  Jos Stam,et al.  Nucleus: Towards a unified dynamics solver for computer graphics , 2009, 2009 11th IEEE International Conference on Computer-Aided Design and Computer Graphics.

[6]  Rachel McDonnell,et al.  Smooth movers: perceptually guided human motion simulation , 2007, SCA '07.

[7]  Michael Gleicher,et al.  Retargetting motion to new characters , 1998, SIGGRAPH.

[8]  Hsi-Jian Lee,et al.  Determination of 3D human body postures from a single view , 1985, Comput. Vis. Graph. Image Process..

[9]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[10]  David Salesin,et al.  A sketching interface for articulated figure animation , 2006, SIGGRAPH 2006.

[11]  Hans-Peter Seidel,et al.  Motion-aware temporal coherence for video resizing , 2009, ACM Trans. Graph..

[12]  David Salesin,et al.  The virtual cinematographer: a paradigm for automatic real-time camera control and directing , 1996, SIGGRAPH.

[13]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[14]  Rachel McDonnell,et al.  Perceptually Adaptive Graphics , 2004, Eurographics.

[15]  John Lasseter Tricks to animating characters with a computer , 2001, COMG.

[16]  Trevor Darrell,et al.  Constraining human body tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Frédo Durand,et al.  2.5D cartoon models , 2010, ACM Trans. Graph..

[18]  John Paulin Hansen,et al.  Evaluation of a low-cost open-source gaze tracker , 2010, ETRA.

[19]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  David A. Forsyth,et al.  Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis , 2005, Found. Trends Comput. Graph. Vis..

[21]  Markus H. Gross,et al.  A system for retargeting of streaming video , 2009, ACM Trans. Graph..

[22]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  David Salesin,et al.  Multiperspective panoramas for cel animation , 1997, SIGGRAPH.

[24]  Adam Finkelstein,et al.  Where do people draw lines? , 2008, ACM Trans. Graph..

[25]  O. Sorkine,et al.  Motion-based video retargeting with optimized crop-and-warp , 2010, ACM Trans. Graph..

[26]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[27]  Yaser Sheikh,et al.  Leveraging the talent of hand animators to create three-dimensional animation , 2009, SCA '09.

[28]  B. Velichkovsky,et al.  Time course of information processing during scene perception: The relationship between saccade amplitude and fixation duration , 2005 .

[29]  F. Thomas,et al.  The illusion of life : Disney animation , 1981 .

[30]  J. Helmert,et al.  Visual Fixation Durations and Saccade Amplitudes: Shifting Relationship in a Variety of Conditions , 2008 .

[31]  Daniel Cohen-Or,et al.  Feature-aware texturing , 2006, EGSR '06.

[32]  Douglas DeCarlo,et al.  Stylization and abstraction of photographs , 2002, ACM Trans. Graph..

[33]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, SIGGRAPH 2004.

[34]  W. Eisner Comics and Sequential Art , 1985 .

[35]  Carol O'Sullivan,et al.  Perceptual evaluation of LOD clothing for virtual humans , 2006, SCA '06.

[36]  Eli Shechtman,et al.  Cosaliency: where people look when comparing images , 2010, UIST.

[37]  Taku Komura,et al.  Automatic Panel Extraction of Color Comic Images , 2007, PCM.

[38]  Thomas Martinetz,et al.  Variability of eye movements when viewing dynamic natural scenes. , 2010, Journal of vision.

[39]  Ariel Shamir,et al.  Seam carving for media retargeting , 2009, CACM.

[40]  Hans-Peter Seidel,et al.  Scaled Motion Dynamics for Markerless Motion Capture , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Joan E. Hart,et al.  Film Directing Shot by Shot: Visualizing from Concept to Screen , 1991 .

[42]  Päivi Majaranta,et al.  Design issues of iDICT: a gaze-assisted translation aid , 2000, ETRA.

[43]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[44]  David Salesin,et al.  Gaze-based interaction for semi-automatic photo cropping , 2006, CHI.

[45]  Leif Kobbelt,et al.  Character animation from 2D pictures and 3D motion data , 2007, TOGS.

[46]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[47]  Takeo Kanade,et al.  Image matching in large scale indoor environment , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[48]  R. Rosenholtz A simple saliency model predicts a number of motion popout phenomena , 1999, Vision Research.

[49]  Carol O'Sullivan,et al.  Clone attack! Perception of crowd variety , 2008, ACM Trans. Graph..

[50]  Laurent Itti,et al.  Attention, Movie Cuts, and Natural Vision: A Functional Perspective , 2008 .

[51]  Pascal Fua,et al.  Hierarchical Implicit Surface Joint Limits to Constrain Video-Based Motion Capture , 2004, ECCV.

[52]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[53]  Diego Gutierrez,et al.  Using eye-tracking to assess different image retargeting methods , 2011, APGV '11.

[54]  Jock D. Mackinlay,et al.  Rapid controlled movement through a virtual 3D workspace , 1990, SIGGRAPH.

[55]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Olga Sorkine-Hornung,et al.  Visual media retargeting , 2009, SIGGRAPH ASIA Courses.

[57]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[58]  Marc Pollefeys,et al.  Multiple view geometry , 2005 .

[59]  Xiaolin K. Wei,et al.  VideoMocap: modeling physically realistic human motion from monocular video sequences , 2010, ACM Trans. Graph..

[60]  Steven M. Drucker,et al.  CamDroid: a system for implementing intelligent camera control , 1995, I3D '95.

[61]  Jovan Popović,et al.  Semantic deformation transfer , 2009, SIGGRAPH 2009.

[62]  Harry Shum,et al.  Stylizing motion with drawings , 2003, SCA '03.

[63]  A Aldo Faisal,et al.  Ultra-low cost eyetracking as an high-information throughput alternative to BMIs , 2011, BMC Neuroscience.

[64]  Carol O'Sullivan,et al.  Predicting and Evaluating Saliency for Simplified Polygonal Models , 2005, TAP.

[65]  William H. Bares Panel Beat: Layout and Timing of Comic Panels , 2008, Smart Graphics.

[66]  Kohei Arai,et al.  Automatic E-Comic Content Adaptation , 2010 .

[67]  F. Phillips,et al.  What can drawing tell us about our mental representation of shape , 2010 .

[68]  Yaser Sheikh,et al.  Augmenting hand animation with three-dimensional secondary motion , 2010, SCA '10.

[69]  Alla Safonova,et al.  Achieving good connectivity in motion graphs , 2008, SCA '08.

[70]  J. Antes The time course of picture viewing. , 1974, Journal of experimental psychology.

[71]  T. Kanade,et al.  A Wearable Device for First Person Vision , 2011 .

[72]  David S Wooding,et al.  Eye movements of large populations: II. Deriving regions of interest, coverage, and similarity using fixation maps , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[73]  Bruce Walter,et al.  Dimensionality of visual complexity in computer graphics scenes , 2008, Electronic Imaging.

[74]  Peter Probst Mixed media , 2009 .

[75]  L. Stark,et al.  Scanpaths in Eye Movements during Pattern Perception , 1971, Science.

[76]  Hans-Peter Seidel,et al.  Staying Well Grounded in Markerless Motion Capture , 2008, DAGM-Symposium.

[77]  Gang Hua,et al.  Tracking articulated body by dynamic Markov network , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[78]  Scott F. Johnston Lumo: illumination for cel animation , 2002, NPAR '02.

[79]  Hans-Peter Seidel,et al.  Online Smoothing for Markerless Motion Capture , 2007, DAGM-Symposium.

[80]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[81]  Michael Gleicher,et al.  Video retargeting: automating pan and scan , 2006, MM '06.

[82]  Paul S. A. Reitsma,et al.  Effect of scenario on perceptual sensitivity to errors in animation , 2008, APGV '08.

[83]  C. O'Sullivan,et al.  LOD human representations: a comparative study , 2005 .

[84]  Michael Gleicher,et al.  Through-the-lens camera control , 1992, SIGGRAPH.

[85]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[86]  David A. Forsyth,et al.  Generalizing motion edits with Gaussian processes , 2009, ACM Trans. Graph..

[87]  John Dingliana,et al.  Adding Depth to Cartoons Using Sparse Depth (In)equalities , 2010, Comput. Graph. Forum.

[88]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[89]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[90]  Katsu Yamane,et al.  Animating non-humanoid characters with human motion data , 2010, SCA '10.

[91]  Adam Finkelstein,et al.  Texture mapping for cel animation , 1998, SIGGRAPH.

[92]  Marcus Nyström,et al.  A vector-based, multidimensional scanpath similarity measure , 2010, ETRA.

[93]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[94]  Eric Daniels,et al.  Deep canvas in Disney's Tarzan , 1999, SIGGRAPH '99.

[95]  David Salesin,et al.  Parallax photography: creating 3D cinematic effects from stills , 2009, Graphics Interface.

[96]  Ariel Shamir,et al.  Improved seam carving for video retargeting , 2008, SIGGRAPH 2008.

[97]  S. Avidan,et al.  Seam carving for content-aware image resizing , 2007, SIGGRAPH 2007.

[98]  H. Ritter,et al.  Disambiguating Complex Visual Information: Towards Communication of Personal Views of a Scene , 1996, Perception.

[99]  Kavita Bala,et al.  Perception of complex aggregates , 2008, SIGGRAPH 2008.

[100]  Daniel Cohen-Or,et al.  Optimizing Photo Composition , 2010, Comput. Graph. Forum.

[101]  David J. Fleet,et al.  Temporal motion models for monocular and multiview 3D human body tracking , 2006, Comput. Vis. Image Underst..

[102]  Takahide Omori・Takeharu Igaki Eye catchers in comics: Controlling eye movements in reading pictorial and textual media , 2005 .

[103]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[104]  Nancy S. Pollard,et al.  Effect of Character Animacy and Preparatory Motion on Perceptual Magnitude of Errors in Ballistic Motion , 2008, Comput. Graph. Forum.

[105]  Tong-Yee Lee,et al.  Scalable and coherent video resizing with per-frame optimization , 2011, SIGGRAPH 2011.

[106]  Paul Rademacher,et al.  View-dependent geometry , 1999, SIGGRAPH.

[107]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[108]  Jessica K. Hodgins,et al.  Evaluating the effect of motion and body shape on the perceived sex of virtual characters , 2009, TAP.

[109]  Nicolas Courty,et al.  Image-Based Virtual Camera Motion Strategies , 2000, Graphics Interface.

[110]  Bruce Walter,et al.  Visual equivalence: towards a new standard for image fidelity , 2007, ACM Trans. Graph..

[111]  Doug Cooper 2D/3D hybrid character animation on "Spirit" , 2002, SIGGRAPH '02.

[112]  Steven M. Drucker,et al.  CINEMA: a system for procedural camera movements , 1992, I3D '92.

[113]  Nancy S. Pollard,et al.  Perceptual metrics for character animation: sensitivity to errors in ballistic motion , 2003, ACM Trans. Graph..

[114]  Kasia Muldner,et al.  Using Eye-Tracking Data for High-Level User Modeling in Adaptive Interfaces , 2007, AAAI.

[115]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[116]  Yaser Sheikh,et al.  Content retargeting using parameter-parallel facial layers , 2011, SCA '11.

[117]  Richard A. Bolt,et al.  A gaze-responsive self-disclosing display , 1990, CHI '90.