Pose2Pose: pose selection and transfer for 2D character animation

An artist faces two challenges when creating a 2D animated character to mimic a specific human performance. First, the artist must design and draw a collection of artwork depicting portions of the character in a suitable set of poses, for example arm and hand poses that can be selected and combined to express the range of gestures typical for that person. Next, to depict a specific performance, the artist must select and position the appropriate set of artwork at each moment of the animation. This paper presents a system that addresses these challenges by leveraging video of the target human performer. Our system tracks arm and hand poses in an example video of the target. The UI displays clusters of these poses to help artists select representative poses that capture the actor's style and personality. From this mapping of pose data to character artwork, our system can generate an animation from a new performance video. It relies on a dynamic programming algorithm to optimize for smooth animations that match the poses found in the video. Artists used our system to create four 2D characters and were pleased with the final automatically animated results. We also describe additional applications addressing audio-driven or text-based animations.

[1]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[2]  L. Thurstone The method of paired comparisons for social values , 1927 .

[3]  Mira Dontcheva,et al.  TakeToons: Script-driven Performance Animation , 2018, UIST.

[4]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[5]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[6]  Szymon Rusinkiewicz,et al.  Tooncap: a layered deformable model for capturing poses from cartoon characters , 2018, Expressive.

[7]  Rubaiat Habib Kazi,et al.  Sketching Stylized Animated Drawings with Motion Amplifiers , 2016, CHI Extended Abstracts.

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Hongbo Fu,et al.  Live Sketch: Video-driven Dynamic Deformation of Static Drawings , 2018, CHI.

[10]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Pavlo Molchanov,et al.  Hand gesture recognition with 3D convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Yuyu Xu,et al.  Virtual character performance from speech , 2013, SCA '13.

[13]  M. Studdert-Kennedy Hand and Mind: What Gestures Reveal About Thought. , 1994 .

[14]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[15]  Rafal Mantiuk,et al.  A practical guide and software for analysing pairwise comparison experiments , 2017, ArXiv.

[16]  Scott Schaefer,et al.  Image deformation using moving least squares , 2006, ACM Trans. Graph..

[17]  Matthew Stone,et al.  Speaking with hands: creating animated conversational characters from recordings of human performance , 2004, ACM Trans. Graph..

[18]  Shigeo Morishima,et al.  Voice Animator: Automatic Lip-Synching in Limited Animation by Audio , 2017, ACE.

[19]  Adam Finkelstein,et al.  Secondary Motion for Performed 2D Animation , 2017, UIST.

[20]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[21]  Adam Finkelstein,et al.  Video puppetry: a performative interface for cutout animation , 2008, ACM Trans. Graph..

[22]  S. Levine,et al.  Gesture controllers , 2010, ACM Trans. Graph..

[23]  Rubaiat Habib Kazi,et al.  Draco: bringing life to illustrations with kinetic textures , 2014, CHI.

[24]  Sergey Levine,et al.  Real-time prosody-driven synthesis of body language , 2009, ACM Trans. Graph..

[25]  Xin Wang,et al.  An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis , 2017, INTERSPEECH.

[26]  Kåre Sjölander,et al.  An HMM-based system for automatic segmentation and alignment of speech , 2003 .

[27]  Engin Erzin,et al.  Affect-expressive hand gestures synthesis and animation , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[28]  Georgios Tzimiropoulos,et al.  Human Pose Estimation via Convolutional Part Heatmap Regression , 2016, ECCV.

[29]  Adam Finkelstein,et al.  Triggering Artwork Swaps for Live Animation , 2017, UIST.

[30]  Michael Neff,et al.  Segmentation of hand gestures using motion capture data , 2013, AAMAS.

[31]  Hans-Peter Seidel,et al.  Annotated New Text Engine Animation Animation Lexicon Animation Gesture Profiles MR : . . . JL : . . . Gesture Generation Video Annotated Gesture Script , 2007 .

[32]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Michael Neff,et al.  Design and evaluation of a sketch-based gesture animation tool , 2014, MIG.

[34]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[35]  Wilmot Li,et al.  Customized expression recognition for performance-driven cutout character animation , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[36]  Jessica K. Hodgins,et al.  Data-driven finger motion synthesis for gesturing characters , 2012, ACM Trans. Graph..

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[39]  Junjun Pan,et al.  Sketch-Based Skeleton-Driven 2D Animation and Motion Capture , 2009, Trans. Edutainment.

[40]  Daniel Cohen-Or,et al.  Action synopsis: pose selection and illustration , 2005, ACM Trans. Graph..

[41]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[42]  Francesc Alías,et al.  Gesture synthesis adapted to speech emphasis , 2014, Speech Commun..

[43]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Leif Kobbelt,et al.  Character animation from 2D pictures and 3D motion data , 2007, TOGS.