Action snapshot with single pose and viewpoint

Many art forms present visual content as a single image captured from a particular viewpoint. How to select a meaningful representative moment from an action performance is difficult, even for an experienced artist. Often, a well-picked image can tell a story properly. This is important for a range of narrative scenarios, such as journalists reporting breaking news, scholars presenting their research, or artists crafting artworks. We address the underlying structures and mechanisms of a pictorial narrative with a new concept, called the action snapshot, which automates the process of generating a meaningful snapshot (a single still image) from an input of scene sequences. The input of dynamic scenes could include several interactive characters who are fully animated. We propose a novel method based on information theory to quantitatively evaluate the information contained in a pose. Taking the selected top postures as input, a convolutional neural network is constructed and trained with the method of deep reinforcement learning to select a single viewpoint, which maximally conveys the information of the sequence. User studies are conducted to experimentally compare the computer-selected poses and viewpoints with those selected by human participants. The results show that the proposed method can assist the selection of the most informative snapshot effectively from animation-intensive scenarios.

[1]  Chun-Fa Chang,et al.  Key Probe: a technique for animation keyframe extraction , 2005, The Visual Computer.

[2]  Kwan-Liu Ma,et al.  Dynamic video narratives , 2010, SIGGRAPH 2010.

[3]  W. Wolf Narrative and narrativity: A narratological reconceptualization and its applicability to the visual arts , 2003 .

[4]  Marc Christie,et al.  Intuitive and efficient camera control with the toric space , 2015, ACM Trans. Graph..

[5]  Maya Cakmak,et al.  Keyframe-based Learning from Demonstration , 2012, Int. J. Soc. Robotics.

[6]  Gotthold Ephraim Lessing,et al.  Laocoon: An Essay Upon the Limits of Painting and Poetry. with Remarks Illustrative of Various Points in the History of Ancient Art , 2008 .

[7]  Marc Christie,et al.  Efficient composition for virtual camera control , 2012, SCA '12.

[8]  Yu-Wei Zhang,et al.  Bas-Relief Generation and Shape Editing through Gradient-Based Mesh Deformation , 2015, IEEE Transactions on Visualization and Computer Graphics.

[9]  Mateu Sbert,et al.  Viewpoint Selection using Viewpoint Entropy , 2001, VMV.

[10]  Lihi Zelnik-Manor,et al.  Viewpoint Selection for Human Actions , 2012, International Journal of Computer Vision.

[11]  Dan Zhao,et al.  Optimization-based key frame extraction for motion capture animation , 2012, The Visual Computer.

[12]  Cagatay Turkay,et al.  An information theoretic approach to camera control for crowded scenes , 2009, The Visual Computer.

[13]  Yasuyuki Matsushita,et al.  Dynamic stills and clip trailers , 2006, The Visual Computer.

[14]  Gotthold Ephraim Lessing,et al.  Laocoon; Or the Limits of Poetry and Painting , 2008 .

[15]  Daniel Cohen-Or,et al.  Action synopsis: pose selection and illustration , 2005, ACM Trans. Graph..

[16]  Klaus Speidel,et al.  Can a Single Still Picture Tell a Story? Definitions of Narrative and the Alleged Problem of Time with Single Still Pictures , 2013 .

[17]  Richard Williams The Animator's Survival Kit--Revised Edition: A Manual of Methods, Principles and Formulas for Classical, Computer, Games, Stop Motion and Internet Animators , 2009 .

[18]  Daniel Cohen-Or,et al.  The Virtual Director: a Correlation‐Based Online Viewing of Human Motion , 2010, Comput. Graph. Forum.

[19]  Lei Feng,et al.  Keyframe Extraction for Human Motion Capture Data Based on Joint Kernel Sparse Representation , 2017, IEEE Transactions on Industrial Electronics.

[20]  Adam Finkelstein,et al.  Perceptual models of viewpoint preference , 2011, TOGS.

[21]  Yael Pritch,et al.  Making a Long Video Short: Dynamic Video Synopsis , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Wencheng Wang,et al.  Constructing Canonical Regions for Fast and Effective View Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Tong-Yee Lee,et al.  Motion overview of human actions , 2008, SIGGRAPH 2008.

[24]  In-Kwon Lee,et al.  Determination of camera parameters for character motions using motion area , 2008, The Visual Computer.

[25]  Tolga K. Çapin,et al.  Multiscale motion saliency for keyframe extraction from motion capture sequences , 2011, Comput. Animat. Virtual Worlds.

[26]  Hyun Joon Shin,et al.  Single image summarization of 3D animation using depth images , 2012, Comput. Animat. Virtual Worlds.

[27]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[28]  Oleksandr Romanko,et al.  Normalization and Other Topics in Multi­Objective Optimization , 2006 .

[29]  Michael Gleicher,et al.  Staggered poses: a character motion representation for detail-preserving editing of pose and coordinated timing , 2008, SCA '08.

[30]  Chao Jin,et al.  Optimized keyframe extraction for 3D character animations , 2012, Comput. Animat. Virtual Worlds.

[31]  Dongjian He,et al.  Saliency-based relief generation , 2013 .

[32]  Jian Zhang,et al.  Pose selection for animated scenes and a case study of bas-relief generation , 2017, CGI.