FreeStyleGAN

Current Generative Adversarial Networks (GANs) produce photorealistic renderings of portrait images. Embedding real images into the latent space of such models enables high-level image editing. While recent methods provide considerable semantic control over the (re-)generated images, they can only generate a limited set of viewpoints and cannot explicitly control the camera. Such 3D camera control is required for 3D virtual and mixed reality applications. In our solution, we use a few images of a face to perform 3D reconstruction, and we introduce the notion of the GAN camera manifold, the key element allowing us to precisely define the range of images that the GAN can reproduce in a stable manner. We train a small face-specific neural implicit representation network to map a captured face to this manifold and complement it with a warping scheme to obtain free-viewpoint novel-view synthesis. We show how our approach - due to its precise camera control - enables the integration of a pre-trained StyleGAN into standard 3D rendering pipelines, allowing e.g., stereo rendering or consistent insertion of faces in synthetic 3D environments. Our solution proposes the first truly free-viewpoint rendering of realistic faces at interactive rates, using only a small number of casual photos as input, while simultaneously allowing semantic editing capabilities, such as facial expression or lighting changes.

[1]  Jason M. Saragih,et al.  Deep relightable appearance models for animatable faces , 2021, ACM Transactions on Graphics.

[2]  Jaakko Lehtinen,et al.  Alias-Free Generative Adversarial Networks , 2021, NeurIPS.

[3]  Jingyi Yu,et al.  Editable free-viewpoint video using a layered neural representation , 2021, ACM Trans. Graph..

[4]  Andreas Geiger,et al.  CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields , 2021, 2021 International Conference on 3D Vision (3DV).

[5]  Hans-Peter Seidel,et al.  PhotoApp , 2021, ACM Trans. Graph..

[6]  Daniel Cohen-Or,et al.  Designing an encoder for StyleGAN image manipulation , 2021, ACM Trans. Graph..

[7]  Michael Zollhöfer,et al.  Learning Compositional Radiance Fields of Dynamic Human Heads , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Chia-Kai Liang,et al.  Portrait Neural Radiance Fields from a Single Image , 2020, ArXiv.

[9]  Jonathan T. Barron,et al.  NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Justus Thies,et al.  Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Rodrigo Ortiz Cayon,et al.  Free-Viewpoint Facial Re-Enactment from a Casual Capture , 2020, SIGGRAPH Asia Posters.

[12]  Jiajun Wu,et al.  pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Daniel Cremers,et al.  i3DMM: Deep Implicit 3D Morphable Model of Human Heads , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Anil K. Jain,et al.  Lifting 2D StyleGAN for 3D-Aware Face Generation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jonathan T. Barron,et al.  Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Rynson W. H. Lau,et al.  Is a Green Screen Really Necessary for Real-Time Human Matting? , 2020 .

[17]  Chen Change Loy,et al.  Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D Image GANs , 2020, ICLR.

[18]  A. Torralba,et al.  Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering , 2020, ICLR.

[19]  Kai Zhang,et al.  NeRF++: Analyzing and Improving Neural Radiance Fields , 2020, ArXiv.

[20]  Christian Theobalt,et al.  PIE , 2020, ACM Trans. Graph..

[21]  Antonio Torralba,et al.  Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space , 2020, ArXiv.

[22]  Victor Lempitsky,et al.  Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars , 2020, ECCV.

[23]  Gernot Riegler,et al.  Free View Synthesis , 2020, ECCV.

[24]  N. Mitra,et al.  StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows , 2020, ACM Trans. Graph..

[25]  Jonathan T. Barron,et al.  NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Daniel Cohen-Or,et al.  Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Zhengqi Li,et al.  Crowdsampling the Plenoptic Function , 2020, ECCV.

[28]  Jingyi Yu,et al.  SofGAN: A Portrait Image Generator with Dynamic Styling , 2020, ACM Trans. Graph..

[29]  Andreas Geiger,et al.  GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis , 2020, NeurIPS.

[30]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[31]  Gordon Wetzstein,et al.  Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[32]  Tero Karras,et al.  Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[33]  Justus Thies,et al.  Image-guided Neural Object Rendering , 2020, ICLR.

[34]  Jiaolong Yang,et al.  Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jiaolong Yang,et al.  Deep 3D Portrait From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Gordon Wetzstein,et al.  State of the Art on Neural Rendering , 2020, Comput. Graph. Forum.

[37]  Aaron Hertzmann,et al.  GANSpace: Discovering Interpretable GAN Controls , 2020, NeurIPS.

[38]  Bernhard Egger,et al.  A Morphable Face Albedo Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Christian Theobalt,et al.  StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Bolei Zhou,et al.  In-Domain GAN Inversion for Real Image Editing , 2020, ECCV.

[41]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[42]  Xiaogang Wang,et al.  Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Nicu Sebe,et al.  First Order Motion Model for Image Animation , 2020, NeurIPS.

[44]  Tero Karras,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Peter Wonka,et al.  Image2StyleGAN++: How to Edit the Embedded Images? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Hao Li,et al.  Deep face normalization , 2019, ACM Trans. Graph..

[47]  Jan Kautz,et al.  Few-shot Video-to-Video Synthesis , 2019, NeurIPS.

[48]  T. Vetter,et al.  3D Morphable Face Models—Past, Present, and Future , 2019, ACM Trans. Graph..

[49]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Phillip Isola,et al.  On the "steerability" of generative adversarial networks , 2019, ICLR.

[51]  Yaser Sheikh,et al.  VR facial animation via multiview image translation , 2019, ACM Trans. Graph..

[52]  Jaakko Lehtinen,et al.  E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles , 2019, ArXiv.

[53]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[54]  V. Lempitsky,et al.  Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Zeng Huang,et al.  Learning Perspective Undistortion of Portraits , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Justus Thies,et al.  Deferred neural rendering , 2019, ACM Trans. Graph..

[57]  Thomas A. Funkhouser,et al.  Learning Shape Templates With Structured Implicit Functions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Yong-Liang Yang,et al.  HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[60]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Jan-Michael Frahm,et al.  Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[62]  Kun Zhou,et al.  Warp-guided GANs for single-photo facial animation , 2018, ACM Trans. Graph..

[63]  Jiajun Wu,et al.  Visual Object Networks: Image Generation with Disentangled 3D Representations , 2018, NeurIPS.

[64]  Jason M. Saragih,et al.  Deep appearance models for face rendering , 2018, ACM Trans. Graph..

[65]  Patrick Pérez,et al.  Deep video portraits , 2018, ACM Trans. Graph..

[66]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[67]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68]  Y. Blau,et al.  The Perception-Distortion Tradeoff , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  Mike Seymour,et al.  Meet Mike: epic avatars , 2017, SIGGRAPH VR Village.

[70]  Joon Son Chung,et al.  VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.

[71]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[72]  Robert Pless,et al.  Deep Feature Interpolation for Image Content Changes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[74]  Adam Finkelstein,et al.  Perspective-aware manipulation of portrait photos , 2016, ACM Trans. Graph..

[75]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[77]  Marc Christie,et al.  Intuitive and efficient camera control with the toric space , 2015, ACM Trans. Graph..

[78]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[80]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Aaron C. Courville,et al.  Generative Adversarial Nets , 2014, NIPS.

[82]  Marc Christie,et al.  Efficient composition for virtual camera control , 2012, SCA '12.

[83]  Lei Yang,et al.  Image-based bidirectional scene reprojection , 2011, ACM Trans. Graph..

[84]  Paul E. Debevec,et al.  Multiview face capture using polarized spherical gradient illumination , 2011, ACM Trans. Graph..

[85]  Zeev Farbman,et al.  Convolution pyramids , 2011, ACM Trans. Graph..

[86]  Rui Wang,et al.  Real‐time Depth of Field Rendering via Dynamic Light Field Generation and Filtering , 2010, Comput. Graph. Forum.

[87]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, ACM Trans. Graph..

[88]  Zeev Farbman,et al.  Coordinates for instant image cloning , 2009, ACM Trans. Graph..

[89]  Patrick Olivier,et al.  Camera Control in Computer Graphics , 2006, Eurographics.

[90]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[91]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[92]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[93]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[94]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[95]  Paul E. Debevec,et al.  Acquiring the reflectance field of a human face , 2000, SIGGRAPH.

[96]  Leonard McMillan,et al.  Post-rendering 3D warping , 1997, SI3D.

[97]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[98]  Carl-Fredrik Westin,et al.  Normalized and differential convolution , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[99]  James F. Blinn,et al.  Where am I? What am I looking at? (cinematography) , 1988, IEEE Computer Graphics and Applications.

[100]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[101]  A. Paoluzzi,et al.  MODELING AND RENDERING , 2010 .

[102]  Stephen Milborrow The MUCT Landmarked Face Database , 2010 .

[103]  Patrick J. Flynn,et al.  A survey of approaches and challenges in 3D and multi-modal 3D + 2D face recognition , 2006, Comput. Vis. Image Underst..

[104]  Olga Sorkine-Hornung,et al.  Laplacian Mesh Processing , 2005, Eurographics.

[105]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.