3D-Aware Video Generation

Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos. We show that our method learns a rich embedding of decomposable 3D structures and motions that enables new visual effects of spatio-temporal renderings while producing imagery with quality comparable to that of existing 3D or video GANs.

[1]  David B. Lindell,et al.  3D GAN Inversion for Controllable Portrait Image Animation , 2022, ArXiv.

[2]  Jinwoo Shin,et al.  Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks , 2022, ICLR.

[3]  Mohamed Elhoseiny,et al.  StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2 , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jeong Joon Park,et al.  StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Bolei Zhou,et al.  3D-aware Image Synthesis via Learning Structural and Textural Representations , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xin Tong,et al.  GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Shalini De Mello,et al.  Efficient Geometry-aware 3D Generative Adversarial Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  David B. Lindell,et al.  Bacon: Band-limited Coordinate Networks for Multiscale Scene Representation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Mehdi S. M. Sajjadi,et al.  Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Christian Theobalt,et al.  StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis , 2021, ICLR.

[11]  Ronald Clark,et al.  TermiNeRF: Ray Termination Prediction for Efficient Neural Rendering , 2021, 2021 International Conference on 3D Vision (3DV).

[12]  Sanja Fidler,et al.  EditGAN: High-Precision Semantic Image Editing , 2021, NeurIPS.

[13]  Bo Dai,et al.  Generative Occupancy Fields for 3D Surface-Aware Image Synthesis , 2021, NeurIPS.

[14]  Christian Theobalt,et al.  A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis , 2021, NeurIPS.

[15]  Abhinav Shrivastava,et al.  NeRV: Neural Representations for Videos , 2021, NeurIPS.

[16]  Bingbing Ni,et al.  CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis , 2021, ArXiv.

[17]  Thomas H. Li,et al.  PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Christian Theobalt,et al.  StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN , 2021, BMVC.

[19]  Gordon Wetzstein,et al.  Fast Training of Neural Lumigraph Representations using Meta Learning , 2021, NeurIPS.

[20]  Huchuan Lu,et al.  Animatable Neural Radiance Fields from Monocular RGB Video , 2021, ArXiv.

[21]  Jonathan T. Barron,et al.  HyperNeRF , 2021, ACM Trans. Graph..

[22]  Jaakko Lehtinen,et al.  Alias-Free Generative Adversarial Networks , 2021, NeurIPS.

[23]  Christian Theobalt,et al.  Neural actor , 2021, ACM Trans. Graph..

[24]  Jihwan Kim,et al.  Self-Supervised Video GANs: Learning for Appearance Consistency and Motion Coherency , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Johannes Kopf,et al.  Dynamic View Synthesis from Dynamic Monocular Video , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Jingyi Yu,et al.  Editable free-viewpoint video using a layered neural representation , 2021, ACM Trans. Graph..

[27]  Dimitris N. Metaxas,et al.  A Good Image Generator Is What You Need for High-Resolution Video Synthesis , 2021, ICLR.

[28]  Pieter Abbeel,et al.  VideoGPT: Video Generation using VQ-VAE and Transformers , 2021, ArXiv.

[29]  Andreas Geiger,et al.  UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Ming-Yu Liu,et al.  GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Nitish Srivastava,et al.  Unconstrained Scene Generation with Locally Conditioned Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Andreas Geiger,et al.  CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields , 2021, 2021 International Conference on 3D Vision (3DV).

[33]  Hao Su,et al.  GNeRF: GAN-based Neural Radiance Field without Posed Camera , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Pratul P. Srinivasan,et al.  Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Aäron van den Oord,et al.  Predicting Video with VQVAE , 2021, ArXiv.

[36]  Drew A. Hudson,et al.  Generative Adversarial Transformers , 2021, ICML.

[37]  Alon Shoshan,et al.  GAN-Control: Explicitly Controllable GANs , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Tanner Schmidt,et al.  STaR: Self-supervised Tracking and Reconstruction of Rigid Objects in Motion with Neural Rendering , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  M. Zollhöfer,et al.  Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Jonathan T. Barron,et al.  NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  M. Nießner,et al.  ID-Reveal: Identity-aware DeepFake Video Detection , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Jiajun Wu,et al.  pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Richard Szeliski,et al.  Animating Pictures with Eulerian Motion Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Arun Mallya,et al.  One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Francesc Moreno-Noguer,et al.  D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Zhengqi Li,et al.  Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Changil Kim,et al.  Space-time Neural Irradiance Fields for Free-Viewpoint Video , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jonathan T. Barron,et al.  Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Andreas Geiger,et al.  GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Mohamed Elhoseiny,et al.  Adversarial Generation of Continuous Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Jonathan T. Barron,et al.  NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Thomas Brox,et al.  Temporal Shift GAN for Large Scale Video Generation , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[53]  Jiachen Yang,et al.  MTD-Net: Learning to Detect Deepfakes Images by Multi-Scale Texture Difference , 2021, IEEE Transactions on Information Forensics and Security.

[54]  Binxu Wang,et al.  A Geometric Analysis of Deep Generative Image Models and Its Applications , 2021, ICLR.

[55]  Chia-Kai Liang,et al.  Portrait Neural Radiance Fields from a Single Image , 2020, ArXiv.

[56]  Natalie Parde,et al.  Latent Neural Differential Equations for Video Generation , 2020, Preregister@NeurIPS.

[57]  Simon Lucey,et al.  SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images , 2020, NeurIPS.

[58]  Kai Zhang,et al.  NeRF++: Analyzing and Improving Neural Radiance Fields , 2020, ArXiv.

[59]  Christian Theobalt,et al.  PIE , 2020, ACM Trans. Graph..

[60]  Kyaw Zaw Lin,et al.  Neural Sparse Voxel Fields , 2020, NeurIPS.

[61]  Christoph H. Lampert,et al.  Unsupervised object-centric video generation and decomposition in 3D , 2020, NeurIPS.

[62]  Andreas Geiger,et al.  GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis , 2020, NeurIPS.

[63]  Sean Franklin,et al.  Deepfake Detection using Spatiotemporal Convolutional Networks , 2020, ArXiv.

[64]  Song Han,et al.  Differentiable Augmentation for Data-Efficient GAN Training , 2020, NeurIPS.

[65]  Gordon Wetzstein,et al.  Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[66]  Peter Wonka,et al.  Disentangled Image Generation Through Structured Noise Injection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Christian Theobalt,et al.  StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Richard A. Newcombe,et al.  Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[69]  Ronen Basri,et al.  Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , 2020, NeurIPS.

[70]  Amit K. Roy-Chowdhury,et al.  Non-Adversarial Video Synthesis with Learned Priors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[72]  Thomas Funkhouser,et al.  Local Implicit Grid Representations for 3D Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Marc Pollefeys,et al.  Convolutional Occupancy Networks , 2020, ECCV.

[74]  Gerard Pons-Moll,et al.  Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Nate Kushman,et al.  Inverse Graphics GAN: Learning to Generate 3D Shapes from Unstructured 2D Data , 2020, ArXiv.

[76]  Y. Lipman,et al.  Implicit Geometric Regularization for Learning Shapes , 2020, ICML.

[77]  Yong-Liang Yang,et al.  BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images , 2020, NeurIPS.

[78]  Seunghoon Hong,et al.  High-Fidelity Synthesis with Disentangled Representation , 2020, ECCV.

[79]  Philip H. S. Torr,et al.  Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Alexei A. Efros,et al.  CNN-Generated Images Are Surprisingly Easy to Spot… for Now , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Subramanian Ramamoorthy,et al.  Lower Dimensional Kernels for Video Discriminators , 2019, Neural Networks.

[82]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Andreas Geiger,et al.  Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Tero Karras,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Jakob Uszkoreit,et al.  Scaling Autoregressive Video Models , 2019, ICLR.

[87]  Masanori Koyama,et al.  Train Sparsely, Generate Densely: Memory-Efficient Unsupervised Training of High-Resolution Temporal GAN , 2018, International Journal of Computer Vision.

[88]  Yu Qiao,et al.  MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation , 2020, ECCV.

[89]  Nicu Sebe,et al.  First Order Motion Model for Image Animation , 2020, NeurIPS.

[90]  Anders P. Eriksson,et al.  Implicit Surface Representations As Layers in Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[91]  Stefan Roth,et al.  Markov Decision Process for Video Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[92]  Jeff Donahue,et al.  Adversarial Video Generation on Complex Datasets , 2019 .

[93]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[94]  Yaron Lipman,et al.  Controlling Neural Level Sets , 2019, NeurIPS.

[95]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[96]  Nicu Sebe,et al.  Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[97]  Thomas A. Funkhouser,et al.  Learning Shape Templates With Structured Implicit Functions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[98]  Yong-Liang Yang,et al.  HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[99]  Andreas Rössler,et al.  FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[100]  Vittorio Ferrari,et al.  Learning Single-Image 3D Reconstruction by Generative Modelling of Shape, Pose and Shading , 2019, International Journal of Computer Vision.

[101]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[102]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[103]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[104]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[105]  N. Mitra,et al.  Escaping Plato’s Cave: 3D Shape From Adversarial Rendering , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[106]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[107]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[108]  Sjoerd van Steenkiste,et al.  Towards Accurate Generative Models of Video: A New Metric & Challenges , 2018, ArXiv.

[109]  Edward J. Delp,et al.  Deepfake Video Detection Using Recurrent Neural Networks , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[110]  Luc Van Gool,et al.  Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs , 2018, ArXiv.

[111]  Yong-Liang Yang,et al.  RenderNet: A deep convolutional network for differentiable rendering from 3D shapes , 2018, NeurIPS.

[112]  Bo Zhao,et al.  Modular Generative Adversarial Networks , 2018, ECCV.

[113]  Philip S. Yu,et al.  An Introduction to Image Synthesis with Generative Adversarial Nets , 2018, ArXiv.

[114]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[115]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[116]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[117]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[118]  Wei Xiong,et al.  Learning to Generate Time-Lapse Videos Using Multi-stage Dynamic Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[119]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[120]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[121]  Ersin Yumer,et al.  Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[122]  Subhransu Maji,et al.  3D Shape Induction from 2D Views of Multiple Objects , 2016, 2017 International Conference on 3D Vision (3DV).

[123]  Shunta Saito,et al.  Temporal Generative Adversarial Nets with Singular Value Clipping , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[124]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[125]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.

[126]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[127]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[128]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[129]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[130]  Lina J. Karam,et al.  A No-Reference Image Blur Metric Based on the Cumulative Probability of Blur Detection (CPBD) , 2011, IEEE Transactions on Image Processing.