Neural Puppet: Generative Layered Cartoon Characters

We propose a learning based method for generating new animations of a cartoon character given a few example images. Our method is designed to learn from a traditionally animated sequence, where each frame is drawn by an artist, and thus the input images lack any common structure, correspondences, or labels. We express pose changes as a deformation of a layered 2.5D template mesh, and devise a novel architecture that learns to predict mesh deformations matching the template to a target image. This enables us to extract a common low-dimensional structure from a diverse set of character poses. We combine recent advances in differentiable rendering as well as mesh-aware models to successfully align common template even if only a few character images are available during training. In addition to coarse poses, character appearance also varies due to shading, out-of-plane motions, and artistic effects. We capture these subtle changes by applying an image translation network to refine the mesh rendering, providing an end-to-end model to generate new animations of a character with high visual quality. We demonstrate that our generative model can be used to synthesize in-between frames and to create data-driven deformation. Our template fitting procedure outperforms state-of-the-art generic techniques for detecting image correspondences.

[1]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[3]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[4]  Jiajun Wu,et al.  Synthesizing 3D Shapes via Modeling Multi-view Depth Maps and Silhouettes with Deep Generative Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Ce Liu,et al.  Deformable Spatial Pyramid Matching for Fast Dense Correspondences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Adam Finkelstein,et al.  Texture mapping for cel animation , 1998, SIGGRAPH.

[8]  Isay Katsman,et al.  Generative Adversarial Perturbations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Pascal Barla,et al.  Example-based expressive animation of 2D rigid bodies , 2017, ACM Trans. Graph..

[10]  Vittorio Ferrari,et al.  Learning to Generate and Reconstruct 3D Meshes with only 2D Supervision , 2018, BMVC.

[11]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[13]  Alexander Kort,et al.  Computer aided inbetweening , 2002, NPAR '02.

[14]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Mathieu Aubry,et al.  Shape correspondences from learnt template-based parametrization , 2018, ECCV 2018.

[16]  Jonathan Richard Shewchuk,et al.  Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator , 1996, WACG.

[17]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[18]  Michael J. Black,et al.  The stitched puppet: A graphical model of 3D human shape and pose , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[20]  John Dingliana,et al.  As-rigid-as-possible image registration for hand-drawn cartoon animations , 2009, NPAR '09.

[21]  Christoph Bregler,et al.  Turning to the masters: motion capturing cartoons , 2002, ACM Trans. Graph..

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  Iasonas Kokkinos,et al.  Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance , 2018, ECCV.

[24]  Hrvoje Vrhovski ADOBE CHARACTER ANIMATOR , 2017 .

[25]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[26]  Wilmot Li,et al.  Toonsynth: example-based synthesis of hand-colored cartoon animations , 2018, ACM Trans. Graph..

[27]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[28]  Frank Van Reeth,et al.  Automatic in-betweening in computer assisted animation by exploiting 2.5D modelling techniques , 2001, Proceedings Computer Animation 2001. Fourteenth Conference on Computer Animation (Cat. No.01TH8596).

[29]  Avinash Sharma,et al.  Deep Textured 3D Reconstruction of Human Bodies , 2018, BMVC.

[30]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[31]  U. Grenander,et al.  Structural Image Restoration through Deformable Templates , 1991 .

[32]  Lin Gao,et al.  Variational Autoencoders for Deforming 3D Mesh Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[34]  Michael J. Black,et al.  Generating 3D faces using Convolutional Mesh Autoencoders , 2018, ECCV.

[35]  Ilya Kostrikov,et al.  Surface Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Markus H. Gross,et al.  BetweenIT: An Interactive Tool for Tight Inbetweening , 2010, Comput. Graph. Forum.

[37]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[38]  蔡万雄 Adobe after effects中抠像技术的应用 , 2012 .

[39]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[40]  John E. Hopcroft,et al.  Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Mirela Ben-Chen,et al.  TexToons: practical texture mapping for hand-drawn cartoon animations , 2011, NPAR '11.

[44]  Peter Schröder,et al.  A simple geometric model for elastic deformations , 2010, ACM Trans. Graph..

[45]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Ser-Nam Lim,et al.  Fine-grained Synthesis of Unrestricted Adversarial Examples , 2019, ArXiv.

[47]  Kevin Wampler,et al.  Fast and reliable example-based mesh IK for stylized deformations , 2016, ACM Trans. Graph..

[48]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Szymon Rusinkiewicz,et al.  Tooncap: a layered deformable model for capturing poses from cartoon characters , 2018, Expressive.

[50]  Jovan Popović,et al.  Bounded biharmonic weights for real-time deformation , 2011, SIGGRAPH 2011.

[51]  Yaron Lipman,et al.  Multi-chart generative surface modeling , 2018, ACM Trans. Graph..

[52]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[53]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[54]  Yaser Sheikh,et al.  Modeling Facial Geometry Using Compositional VAEs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Craig Gotsman,et al.  Smooth Rotation Enhanced As-Rigid-As-Possible Mesh Animation , 2015, IEEE Transactions on Visualization and Computer Graphics.

[56]  Alexander M. Bronstein,et al.  Deformable Shape Completion with Graph Convolutional Autoencoders , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Cordelia Schmid,et al.  DeepMatching: Hierarchical Deformable Dense Matching , 2015, International Journal of Computer Vision.

[58]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[59]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[60]  Pascal Barla,et al.  N-way morphing for 2D animation , 2009 .

[61]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Aaron Hertzmann,et al.  Eurographics/ Acm Siggraph Symposium on Computer Animation (2006) Learning a Correlated Model of Identity and Pose-dependent Body Shape Variation for Real-time Synthesis , 2022 .

[63]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[65]  Edwin E. Catmull,et al.  The problems of computer-assisted animation , 1978, SIGGRAPH.

[66]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[67]  Zoran Popovic,et al.  The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[68]  Marc Alexa,et al.  As-rigid-as-possible surface modeling , 2007, Symposium on Geometry Processing.

[69]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.