Realistic Dynamic Facial Textures from a Single Image Using GANs

We present a novel method to realistically puppeteer and animate a face from a single RGB image using a source video sequence. We begin by fitting a multilinear PCA model to obtain the 3D geometry and a single texture of the target face. In order for the animation to be realistic, however, we need dynamic per-frame textures that capture subtle wrinkles and deformations corresponding to the animated facial expressions. This problem is highly undercon-strained, as dynamic textures cannot be obtained directly from a single image. Furthermore, if the target face has a closed mouth, it is not possible to obtain actual images of the mouth interior. To address this issue, we train a Deep Generative Network that can infer realistic per-frame texture deformations, including the mouth interior, of the target identity using the per-frame source textures and the single target texture. By retargeting the PCA expression geometry from the source, as well as using the newly inferred texture, we can both animate the face and perform video face replacement on the source video using the target appearance.

[1]  Harry Shum,et al.  Face Hallucination: Theory and Practice , 2007, International Journal of Computer Vision.

[2]  Derek Bradley,et al.  Model-based teeth reconstruction , 2016, ACM Trans. Graph..

[3]  Moshe Mahler,et al.  Dynamic units of visual speech , 2012, SCA '12.

[4]  Jan Kautz,et al.  Visio-lization: generating novel facial images , 2009, ACM Trans. Graph..

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jaakko Lehtinen,et al.  Facial Performance Capture with Deep Neural Networks , 2016, ArXiv.

[8]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[9]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[10]  Hao Li,et al.  Photorealistic Facial Texture Inference Using Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Lucas Theis,et al.  Fast Face-Swap Using Convolutional Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Christoph Bregler,et al.  Mood swings: expressive speech animation , 2005, TOGS.

[13]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[14]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[16]  Christian Theobalt,et al.  Reconstructing detailed dynamic face geometry from monocular video , 2013, ACM Trans. Graph..

[17]  Jon Gauthier Conditional generative adversarial nets for convolutional face generation , 2015 .

[18]  Wojciech Matusik,et al.  A statistical model for synthesis of detailed facial geometry , 2006, ACM Trans. Graph..

[19]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[20]  Irfan A. Essa,et al.  Graphcut textures: image and video synthesis using graph cuts , 2003, ACM Trans. Graph..

[21]  Justus Thies,et al.  Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[22]  Wojciech Matusik,et al.  Video face replacement , 2011, ACM Trans. Graph..

[23]  Thabo Beeler,et al.  Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..

[24]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ira Kemelmacher-Shlizerman,et al.  What Makes Kevin Spacey Look Like Kevin Spacey , 2015, ArXiv.

[27]  Eugene Fiume,et al.  JALI , 2016, ACM Trans. Graph..

[28]  Joseph J. Lim,et al.  High-fidelity facial and speech animation for VR HMDs , 2016, ACM Trans. Graph..

[29]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Patrick Pérez,et al.  Automatic Face Reenactment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Lei Xie,et al.  Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[33]  Li Zhang,et al.  Dynamic, expressive speech animation from a single mesh , 2007, SCA '07.

[34]  Patrick Pérez,et al.  VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track , 2015, Comput. Graph. Forum.

[35]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[36]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[37]  Thabo Beeler,et al.  FaceDirector: Continuous Control of Facial Performance in Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).