GIF: Generative Interpretable Faces

Photo-realistic visualization and animation of expressive human faces have been a long standing challenge. 3D face modeling methods provide parametric control but generates unrealistic images, on the other hand, generative 2D models like GANs (Generative Adversarial Networks) output photo-realistic face images, but lack explicit control. Recent methods gain partial control, either by attempting to disentangle different factors in an unsupervised manner, or by adding control post hoc to a pre-trained model. Unconditional GANs, however, may entangle factors that are hard to undo later. We condition our generative model on pre-defined control parameters to encourage disentanglement in the generation process. Specifically, we condition StyleGAN2 on FLAME, a generative 3D face model. While conditioning on FLAME parameters yields unsatisfactory results, we find that conditioning on rendered FLAME geometry and photometric details works well. This gives us a generative 2D face model named GIF (Generative Interpretable Faces) that offers FLAME’s parametric control. Here, interpretable refers to the semantic meaning of different parameters. Given FLAME parameters for shape, pose, expressions, parameters for appearance, lighting, and an additional style vector, GIF outputs photo-realistic face images. We perform an AMT based perceptual study to quantitatively and qualitatively evaluate how well GIF follows its conditioning. The code, data, and trained model are publicly available for research purposes at http://gif.is.tue.mpg.de.

[1]  Michael J. Black,et al.  Learning an animatable detailed 3D face model from in-the-wild images , 2020, ACM Trans. Graph..

[2]  J. Davey,et al.  Capture , 2020, Miller’s Marine War Risks Fourth Edition.

[3]  Stephan J. Garbin,et al.  CONFIG: Controllable Neural Face Image Generation , 2020, ECCV.

[4]  Jiaolong Yang,et al.  Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Christian Theobalt,et al.  StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Tero Karras,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  T. Vetter,et al.  3D Morphable Face Models—Past, Present, and Future , 2019, ACM Trans. Graph..

[8]  Stefanos Zafeiriou,et al.  SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters , 2019, International Journal of Computer Vision.

[9]  Michael J. Black,et al.  Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture From Images “In the Wild” , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[12]  V. Lempitsky,et al.  Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Michael J. Black,et al.  Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Michael J. Black,et al.  Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Justus Thies,et al.  Deferred neural rendering , 2019, ACM Trans. Graph..

[16]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Esa Rahtu,et al.  ICface: Interpretable and Controllable Face Reenactment Using GANs , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[18]  Yong-Liang Yang,et al.  HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[19]  Sergey Tulyakov,et al.  3D Guided Fine-Grained Face Manipulation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Stefanos Zafeiriou,et al.  GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hao Li,et al.  paGAN: real-time avatars using dynamic textures , 2019, ACM Trans. Graph..

[22]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Hao Li,et al.  3D hair synthesis using volumetric variational autoencoders , 2018, ACM Trans. Graph..

[24]  Ersin Yumer,et al.  Real-Time Hair Rendering Using Sequential Adversarial Networks , 2018, ECCV.

[25]  Ron Kimmel,et al.  High Quality Facial Surface and Texture Synthesis via Generative Adversarial Networks , 2018, ECCV Workshops.

[26]  Yaser Sheikh,et al.  Recycle-GAN: Unsupervised Video Retargeting , 2018, ECCV.

[27]  Jason M. Saragih,et al.  Deep appearance models for face rendering , 2018, ACM Trans. Graph..

[28]  Chen Qian,et al.  ReenactGAN: Learning to Reenact Faces via Boundary Transfer , 2018, ECCV.

[29]  Michael J. Black,et al.  Generating 3D faces using Convolutional Mesh Autoencoders , 2018, ECCV.

[30]  Francesc Moreno-Noguer,et al.  GANimation: Anatomically-aware Facial Animation from a Single Image , 2018, ECCV.

[31]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[32]  Patrick Pérez,et al.  Deep video portraits , 2018, ACM Trans. Graph..

[33]  William Smith,et al.  A Data-Augmented 3D Morphable Model of the Ear , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[34]  Thomas Mensink,et al.  IterGANs: Iterative GANs to Learn and Control 3D Object Transformation , 2018, Comput. Vis. Image Underst..

[35]  Josef Kittler,et al.  Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model , 2018, ECCV.

[36]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[37]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[38]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[39]  William Smith,et al.  A 3D Morphable Model of Craniofacial Shape and Texture Variation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  William A. P. Smith,et al.  What Does 2D Geometric Information Really Tell Us About 3D Face Shape? , 2017, International Journal of Computer Vision.

[41]  Derek Bradley,et al.  Simulation‐Ready Hair Capture , 2017, Comput. Graph. Forum.

[42]  A. Ponniah,et al.  Large Scale 3D Morphable Models , 2017, International Journal of Computer Vision.

[43]  Korin Richmond,et al.  A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract , 2016, Comput. Speech Lang..

[44]  Hao Li,et al.  Photorealistic Facial Texture Inference Using Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Derek Bradley,et al.  Model-based teeth reconstruction , 2016, ACM Trans. Graph..

[46]  Dinesh K. Pai,et al.  Interactive gaze driven animation of the eye region , 2016, Web3D.

[47]  Justus Thies,et al.  Face2Face: Real-Time Face Capture and Reenactment of RGB Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Timo Bolkart,et al.  A Groupwise Multilinear Correspondence Optimization for 3D Faces , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Derek Bradley,et al.  Detailed spatio-temporal reconstruction of eyelids , 2015, ACM Trans. Graph..

[50]  Derek Bradley,et al.  High-quality capture of eyes , 2014, ACM Trans. Graph..

[51]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[52]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[53]  Aaron C. Courville,et al.  Generative Adversarial Nets , 2014, NIPS.

[54]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[55]  Alan Brunton,et al.  Multilinear Wavelets: A Statistical Shape Space for Human Faces , 2014, ECCV.

[56]  Alan Brunton,et al.  Review of statistical shape spaces for 3D data with comparative analysis for human faces , 2012, Comput. Vis. Image Underst..

[57]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[58]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[59]  Thomas Vetter,et al.  Expression invariant 3D face recognition with a Morphable Model , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[60]  Wojciech Matusik,et al.  A statistical model for synthesis of detailed facial geometry , 2006, ACM Trans. Graph..

[61]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, ACM Trans. Graph..

[62]  Tomaso A. Poggio,et al.  Reanimating Faces in Images and Video , 2003, Comput. Graph. Forum.

[63]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[64]  C. Taylor,et al.  Active Appearance Models , 2001, ECCV.

[65]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[66]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[67]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[68]  Bernhard Egger,et al.  Semantic Morphable Models , 2017 .

[69]  Soo-Mi Choi,et al.  Extraction and Transfer of Facial Expression Wrinkles for Facial Performance Enhancement , 2014, PG.

[70]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[71]  David J. Kriegman,et al.  Recognition using class specific linear projection , 1997 .

[72]  Ian Craw,et al.  Parameterising Images for Recognition and Reconstruction , 1991, BMVC.

[73]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[74]  Frederic I. Parke,et al.  A parametric model for human faces. , 1974 .