论文信息 - paGAN: real-time avatars using dynamic textures

paGAN: real-time avatars using dynamic textures

With the rising interest in personalized VR and gaming experiences comes the need to create high quality 3D avatars that are both low-cost and variegated. Due to this, building dynamic avatars from a single unconstrained input image is becoming a popular application. While previous techniques that attempt this require multiple input images or rely on transferring dynamic facial appearance from a source actor, we are able to do so using only one 2D input image without any form of transfer from a source image. We achieve this using a new conditional Generative Adversarial Network design that allows fine-scale manipulation of any facial input image into a new expression while preserving its identity. Our photoreal avatar GAN (paGAN) can also synthesize the unseen mouth interior and control the eye-gaze direction of the output, as well as produce the final image from a novel viewpoint. The method is even capable of generating fully-controllable temporally stable video sequences, despite not using temporal information during training. After training, we can use our network to produce dynamic image-based avatars that are controllable on mobile devices in real time. To do this, we compute a fixed set of output images that correspond to key blendshapes, from which we extract textures in UV space. Using a subject's expression blendshapes at run-time, we can linearly blend these key textures together to achieve the desired appearance. Furthermore, we can use the mouth interior and eye textures produced by our network to synthesize on-the-fly avatar animations for those regions. Our work produces state-of-the-art quality image and video synthesis, and is the first to our knowledge that is able to generate a dynamically textured avatar with a mouth interior, all from a single image.

[1] Ira Kemelmacher-Shlizerman,et al. Total Moving Face Reconstruction , 2014, ECCV.

[2] Leon A. Gatys,et al. A Neural Algorithm of Artistic Style , 2015, ArXiv.

[3] Andrew Jones,et al. Digital Ira: creating a real-time photoreal digital actor , 2013, SIGGRAPH '13.

[4] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Jung-Woo Ha,et al. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Shigeo Morishima,et al. High-fidelity facial reflectance and geometry inference from an unconstrained image , 2018, ACM Trans. Graph..

[7] Yong Tao,et al. Compound facial expressions of emotion , 2014, Proceedings of the National Academy of Sciences.

[8] Hao Li,et al. Avatar digitization from a single image for real-time rendering , 2017, ACM Trans. Graph..

[9] P. Ekman,et al. Facial action coding system , 2019 .

[10] Hao Li,et al. Real-Time Facial Segmentation and Performance Capture from RGB Input , 2016, ECCV.

[11] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..

[12] Luc Van Gool,et al. Face/Off: live facial puppetry , 2009, SCA '09.

[13] Fei Yang,et al. Expression flow for 3D-aware face component transfer , 2011, ACM Trans. Graph..

[14] Tieniu Tan,et al. A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[15] Stefanos Zafeiriou,et al. A 3D Morphable Model Learnt from 10,000 Faces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17] Jihun Yu,et al. Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[18] Hanspeter Pfister,et al. Face transfer with multilinear models , 2005, ACM Trans. Graph..

[19] Matthew Turk,et al. A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[20] Rama Chellappa,et al. ExprGAN: Facial Expression Editing with Controllable Expression Intensity , 2017, AAAI.

[21] Chao Yang,et al. Realistic Dynamic Facial Textures from a Single Image Using GANs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22] Paul E. Debevec,et al. Effect of illumination on automatic expression recognition: A novel 3D relightable facial database , 2011, Face and Gesture 2011.

[23] Thomas Vetter,et al. Expression invariant 3D face recognition with a Morphable Model , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[24] M. Pauly,et al. Example-based facial rigging , 2010, ACM Trans. Graph..

[25] Ira Kemelmacher-Shlizerman,et al. What Makes Tom Hanks Look Like Tom Hanks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26] Tieniu Tan,et al. Geometry Guided Adversarial Facial Expression Synthesis , 2017, ACM Multimedia.

[27] P. Ekman,et al. Facial action coding system: a technique for the measurement of facial movement , 1978 .

[28] Wojciech Matusik,et al. Video face replacement , 2011, ACM Trans. Graph..

[29] Daniel Cohen-Or,et al. Bringing portraits to life , 2017, ACM Trans. Graph..

[30] Skyler T. Hawk,et al. Presentation and validation of the Radboud Faces Database , 2010 .

[31] Paul E. Debevec,et al. Rapid Photorealistic Blendshape Modeling from RGB-D Sensors , 2016, CASA.

[32] Justus Thies,et al. Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[33] Yiying Tong,et al. FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[34] Leonidas J. Guibas,et al. Robust single-view geometry and motion reconstruction , 2009, ACM Trans. Graph..

[35] Hao Li,et al. Photorealistic Facial Texture Inference Using Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Joshua Correll,et al. The Chicago face database: A free stimulus set of faces and norming data , 2015, Behavior research methods.

[37] Kun Zhou,et al. Real-time facial animation with image-based dynamic avatars , 2016, ACM Trans. Graph..

[38] Justus Thies,et al. Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[39] Yeongho Seol,et al. Artist friendly facial animation retargeting , 2011, ACM Trans. Graph..

[40] Diego Gutierrez,et al. A practical appearance model for dynamic facial color , 2010, ACM Trans. Graph..

[41] Thabo Beeler,et al. Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..

[42] Andrew Jones,et al. Mesoscopic Facial Geometry Inference Using Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Christian Theobalt,et al. Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[44] Mark Pauly,et al. Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[45] Stefanos Zafeiriou,et al. Large Scale 3D Morphable Models , 2017, International Journal of Computer Vision.

[46] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[47] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[48] Yangang Wang,et al. Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[49] Jihun Yu,et al. Unconstrained realtime facial performance capture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Patrick Pérez,et al. Deep video portraits , 2018, ACM Trans. Graph..